[Ovirt-devel] node becomes "unavailable"

Nicolas Ochem nicolas.ochem at alcatel-lucent.com
Wed Aug 18 14:52:26 UTC 2010


I found out that this problem is probably caused by the change of qmf 
daemon in march.

I solved the problem that you describe, and another problem (taskomatic 
hanging) by reverting the following commits :

ea01a6cbcd53105a39b14e7a216234ca8c2f7ab3 fix storage problem.
3a6c7737f7b1d0249aa70678f0a0f85126d786f9 
Replace the occurence of the type @qmfc.object(
60a37d26cd488b85a409b02313995c75fecc918e Missed this for QMF update.
1eee8e47f4a1ddb3bd4a69616fef2102db7aa24 Update daemons to use new QMF.

Let us know if it works, or if you have difficulties reverting the commits.


On 08/17/2010 09:37 AM, Nicolas Ochem wrote:
> You can look at /var/log/ovirt-server/db-omatic.log . Probably the node
> times out because it does not answer to heartbeat anymore.
>
> To get more detail you can run the db-omatic script in no-daemon mode
> (/usr/share/ovirt-server/db-omatic/db_omatic.rb -n)
>
> I see that very often on fedora 13, a bit less on fedora 12.
>
> This is because the ruby aqmp bindings get stuck when they have to
> handle too many threads.
>
> There's no fix for this yet, but a workaround : whenever that happens,
> restart everything in the node and server with this script :
>
> http://ovirt.pastebin.com/JjNpEDak
> http://ovirt.pastebin.com/tPAPJBpB
>
> You can put that script in a cron job.
>
> On 08/17/2010 05:35 AM, Justin Clacherty wrote:
>    
>>     After running for a while the node becomes "unavailable" in the server
>> UI.  All VMs running on that node also become unavailable.  The node is
>> still running fine as are all the VMs, they're just no longer manageable.
>>
>> I looked on the node and everything appeared to be running fine.  Looked
>> on the server and ovirt-taskomatic was stopped (this seems to happen
>> quite a bit).  Restarted it but that didn't help.  Restarting Matahari
>> on the node sends information to the server but the node does not become
>> available.  The only way I've been able to get it back is to shutdown
>> all the VMs and reboot the node and management server.  Is anyone else
>> seeing this happen?  What else can I look at when it happens again?
>>
>> Cheers,
>> Justin.
>>
>> _______________________________________________
>> Ovirt-devel mailing list
>> Ovirt-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/ovirt-devel
>>
>>      
> _______________________________________________
> Ovirt-devel mailing list
> Ovirt-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/ovirt-devel
>    




More information about the ovirt-devel mailing list