[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS + DRBD Problems



On Mon, 3 Mar 2008, gordan bobich net wrote:

I have a 2-node cluster with Open Shared Root on GFS on DRBD. A single node mounts GFS OK and works, but after a while seems to just block for disk.

[...]

This usually happens after a period of idleness. If the node is used, this doesn't seem to happen, but leaving it alone for half an hour causes it to block for disk I/O.

I've done a bit more digging, and the processes that hang seem to do so, as expected, in disk sleep state.

For example, when trying to log in, sshd hangs. It's status (from /proc) is:

Name:   sshd
State:  D (disk sleep)
SleepAVG:       97%
[...]

The only open file handles it has are:
# ls -la /proc/9643/fd/
total 0
dr-x------ 2 root root  0 Mar  3 16:41 .
dr-xr-xr-x 5 root root  0 Mar  3 16:41 ..
lrwx------ 1 root root 64 Mar  3 16:42 0 -> /dev/null
lrwx------ 1 root root 64 Mar  3 16:42 1 -> /dev/null
lrwx------ 1 root root 64 Mar  3 16:42 2 -> /dev/null
lrwx------ 1 root root 64 Mar  3 16:42 3 -> socket:[118904]
lrwx------ 1 root root 64 Mar  3 16:42 4 -> /cdsl.local/var/run/utmp

I am guessing that it's the utmp that is blocking things, but I'm not sure. I can read-write the /var/run/utmp file just fine (/var/run is symlinked to /cdsl.local/var/run).

The socked is a TCP socket, so I cannot see that being a disk block issue.

As for /dev/null, I didn't think that could be flock-ed...

Looking at cman_tool status and /proc/drbd, both seem to be in order and saying everything is working.

Any ideas as to what could be causing these bogus disk-sleep lock-ups?

Gordan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]