[Linux-cluster] GFS + DRBD Problems
gordan at bobich.net
gordan at bobich.net
Mon Mar 3 17:06:19 UTC 2008
On Mon, 3 Mar 2008, gordan at bobich.net wrote:
> I have a 2-node cluster with Open Shared Root on GFS on DRBD. A single node
> mounts GFS OK and works, but after a while seems to just block for disk.
[...]
> This usually happens after a period of idleness. If the node is used, this
> doesn't seem to happen, but leaving it alone for half an hour causes it
> to block for disk I/O.
I've done a bit more digging, and the processes that hang seem to do so,
as expected, in disk sleep state.
For example, when trying to log in, sshd hangs. It's status (from /proc)
is:
Name: sshd
State: D (disk sleep)
SleepAVG: 97%
[...]
The only open file handles it has are:
# ls -la /proc/9643/fd/
total 0
dr-x------ 2 root root 0 Mar 3 16:41 .
dr-xr-xr-x 5 root root 0 Mar 3 16:41 ..
lrwx------ 1 root root 64 Mar 3 16:42 0 -> /dev/null
lrwx------ 1 root root 64 Mar 3 16:42 1 -> /dev/null
lrwx------ 1 root root 64 Mar 3 16:42 2 -> /dev/null
lrwx------ 1 root root 64 Mar 3 16:42 3 -> socket:[118904]
lrwx------ 1 root root 64 Mar 3 16:42 4 -> /cdsl.local/var/run/utmp
I am guessing that it's the utmp that is blocking things, but I'm not
sure. I can read-write the /var/run/utmp file just fine (/var/run is
symlinked to /cdsl.local/var/run).
The socked is a TCP socket, so I cannot see that being a disk block issue.
As for /dev/null, I didn't think that could be flock-ed...
Looking at cman_tool status and /proc/drbd, both seem to be in order and
saying everything is working.
Any ideas as to what could be causing these bogus disk-sleep lock-ups?
Gordan
More information about the Linux-cluster
mailing list