[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[virt-tools-list] VMs died due to hanging httpd processes

about an hour ago two web-serving VMs died at the same time with the following error on the console:

INFO: task httpd:4304 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd         D 00af1f714d1112e2     0  4304  22471          4305  4303 (NOTLB)
 ffff88006574bdc8  0000000000000282  00000000000041f8  ffff88006574bea8
 000000000000000a  ffff88009747b820  ffffffff804f4b00  00000000001a5eee
 ffff88009747ba08  ffff880095be5015
Call Trace:
 [<ffffffff8022d03c>] mntput_no_expire+0x19/0x89
 [<ffffffff8020eeae>] link_path_walk+0xa6/0xb2
 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80223f33>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
 [<ffffffff8021b52d>] open_namei+0xea/0x6d5
 [<ffffffff8029cb30>] set_process_cpu_timer+0xc7/0xd2
 [<ffffffff80227caa>] do_filp_open+0x1c/0x38
 [<ffffffff8021a364>] do_sys_open+0x44/0xbe
 [<ffffffff802602f9>] tracesys+0xab/0xb6

Monitoring show that in a timeframe of about 3 minutes the load on the systems shot up to over 400 before they died. Since MaxClients is set to 512 I suspect that the processes had a mass-lockup with each process constantly causing a load of 1 (similar to what happens when a process hangs on an NFS mount point). One of the two VMs acts as a NFS server and exports directories to the other VM (but doesn't mount any external NFS sources itself).

What is strange is that both system locked up at the same time since they are running on two different physical hosts. The hosts run Centos 5.3 while the VMs run Centos 5.5 as PV Xen guests.

Since the call trace looks identical on both cases I wonder if anyone has an idea what exactly went wrong here?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]