[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Fedora-xen] Fedora 7 + kernel 2.6.20-2931 + Xen + clvmd gives spinlock bug and hangs



Hi All,

I'm pretty new in the xen and clustering stuff, but I have the following
setup:

2 server (2 x dual Xeon 2GHz, 6GB RAM, 2 x 160GB SATA + 4 x 400GB SATA)
Running both fedora 7, latest updates (kernel 2.6.20-2931.fc7xen)

I want to set this up as both Dom0's are the storage nodes (with GFS2)
And both server are running 2 or 3 XEN VM's with the application (2 x
bind, 2 x postfix, 1 x mysql, 1 x postgresql).
And the possibility to migrate a VM to the other server in case of any
problems.

The 2 x 160GB is setup as an raid 1, on top of that LVM with local
volumes for /boot, / and <swap>
This should hold the local files
the 4 x 400GB is setup as 3 disks RAID 5 and 1 disk spare, giving me
about 800GB of usable diskspace.
On top op the RAID 5 I have DRBD, to keep both raid devices between both
server in sync.

With DRBD v8 it is possible to use them active/active, if you use a
cluster aware file system (like GFS2)
So I've setup openais, and cman. Running on both servers, interacting fine.

When starting clvmd, I get the following (on both servers):

Sep 10 10:08:55 fosfor kernel: BUG: spinlock already unlocked on CPU#0,
dlm_recoverd/4016 (Not tainted)
Sep 10 10:08:56 fosfor kernel:  lock: ffff88016158fd50, .magic:
dead4ead, .owner: <none>/-1, .owner_cpu: -1
Sep 10 10:08:56 fosfor kernel:
Sep 10 10:08:56 fosfor kernel: Call Trace:
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8020b97c>]
_raw_spin_unlock+0x2e/0x7f
Sep 10 10:08:56 fosfor kernel:  [<ffffffff88370233>]
:dlm:dlm_lowcomms_get_buffer+0xf7/0x1cb
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8836c375>]
:dlm:create_rcom+0x3a/0xb3
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8836cb81>]
:dlm:dlm_rcom_status+0x58/0x137
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8836d066>]
:dlm:dlm_set_recover_status+0x1a/0x2e
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8836beb8>]
:dlm:dlm_recover_members+0x332/0x3ea
Sep 10 10:08:56 fosfor kernel:  [<ffffffff80294813>]
keventd_create_kthread+0x0/0x6a
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8836df8f>]
:dlm:dlm_recoverd+0x399/0x3e3
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8836dbf6>]
:dlm:dlm_recoverd+0x0/0x3e3
Sep 10 10:08:56 fosfor kernel:  [<ffffffff80294813>]
keventd_create_kthread+0x0/0x6a
Sep 10 10:08:56 fosfor kernel:  [<ffffffff80232bae>] kthread+0xd0/0xff
Sep 10 10:08:56 fosfor kernel: dlm: got connection from 1
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8025ba68>] child_rip+0xa/0x12
Sep 10 10:08:56 fosfor kernel:  [<ffffffff80294813>]
keventd_create_kthread+0x0/0x6a
Sep 10 10:08:56 fosfor kernel:  [<ffffffff80232ade>] kthread+0x0/0xff
Sep 10 10:08:56 fosfor kernel:  [<ffffffff8025ba5e>] child_rip+0x0/0x12
Sep 10 10:08:56 fosfor kernel:

After that, two dlm processes are running at 100% cpu load at one
processor. (also on both servers)

When I stop the clvmd service, the server hangs (only the server, where
I stop the clvmd)

According to
http://www.redhat.com/archives/linux-cluster/2007-April/msg00133.html
and specially
http://www.redhat.com/archives/linux-cluster/2007-April/msg00171.html
I should use a kernel 2.6.21 or newer.
But this is not available for fedora 7 with xen.

I also tried to use an older kerne (2.6.20-2925.9.fc7xen) both that
doesn't work also.

Or can the cause of this problem be located somewhere else?

Does anybody know when a newer kernel for fedora 7 with xen will be
released (don't see any new kernel in testing also...)

Thnx in advance,

Robert Verspuy


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]