[Linux-cluster] NFS/GFS problems

Fri Jun 10 03:04:53 UTC 2005

Hello,

I am having trouble sharing out a gfs filesystem via nfs.  I have a two
node cluster (active/passive) that is intended to provide nfs shares to a
number of clients.  Its appears that one of node crashes or both nodes
hang when under a heavy (sustained reads and writes by 1 or more nfs
clients for any length of time).  The cluster appears to work fine for io
directly on the nodes - for example I can run bonnie++ for several days on
the nodes directly without problems, but running bonnie++ on the gfs
filename over nfs causes a crash or a hang within a hour or so.

The crashes result in a kernel "Opps" and require the crashed node to be
reset.

The hangs are a little more complicated - both nodes appear to "freeze"
the gfs file system, and any gfs (gfs_tool df, umounts, etc ) related
activity just hangs.  I have been unable to find a clean way to recover
from this situation - attempts to umount the file system just cause the
umount to hang.  The only way I have found to deal with the situation is
to take down one of the nodes ethernet interface, so the other node
notices that it is not receiving heartbeats, fences it, and then proceeds
to continue on without any indication of any problems.

I am using the "RHEL4 cluster" branch from cvs, and the 2.6.9-5.0.5.ELsmp
kernel.  I am using "lock_dlm" locking and the file system was created
via:
gfs_mkfs -r 1536 -j 3 -p lock_dlm -t ftp:dds_space /dev/mapper/ftp_space-erc1
My cluster configuration is pretty simple - sanbox2 fencing with two nodes
and the two nodes option set (<cman two_node="1" expected_votes="1">).

I would greatly appreciate any advice folks have as to what I can do to
fix this problem.   For the list archives it appears that other folks are
serving out gfs filesystems via nfs, so this should be possible, right?

I have attached the relevant part of /var/log/messages
for a crash.  If any additional information would be helpful, please let
me know, and I will get it ( the crashes/hangs are very repeatable!).

Thanks,
  -Jay Cable

Here is the output from one of the crashes:
Jun  9 19:23:46 jin kernel: send_arp uses obsolete (PF_INET,SOCK_PACKET)
Jun  9 19:28:06 jin kernel: Bad page state at prep_new_page (in process
'nfsd', page c159f4e0)
Jun  9 19:28:06 jin kernel: flags:0x20001020 mapping:f6a300e0 mapcount:0
count:2
Jun  9 19:28:06 jin kernel: Backtrace:
Jun  9 19:28:06 jin kernel:  [<c013e669>] bad_page+0x58/0x89
Jun  9 19:28:06 jin kernel:  [<c013e9ec>] prep_new_page+0x24/0x3a
Jun  9 19:28:06 jin kernel:  [<c013eef8>] buffered_rmqueue+0x17d/0x1a5
Jun  9 19:28:06 jin kernel:  [<c013efd4>] __alloc_pages+0xb4/0x298
Jun  9 19:28:06 jin kernel:  [<c013baa2>] find_lock_page+0x96/0x9d
Jun  9 19:28:06 jin kernel:  [<c013d16d>]
generic_file_buffered_write+0x10d/0x47c
Jun  9 19:28:06 jin kernel:  [<c013bac1>] find_or_create_page+0x18/0x72
Jun  9 19:28:06 jin kernel:  [<c013b775>] wake_up_page+0x9/0x29
Jun  9 19:28:06 jin kernel:  [<c013d85e>]
generic_file_aio_write_nolock+0x382/0x3b0
Jun  9 19:28:06 jin kernel:  [<c013d910>]
generic_file_write_nolock+0x84/0x99
Jun  9 19:28:06 jin kernel:  [<f8f96e5f>] gfs_glock_nq+0xe3/0x116 [gfs]
Jun  9 19:28:06 jin kernel:  [<c011e8d2>]
autoremove_wake_function+0x0/0x2d
Jun  9 19:28:06 jin kernel:  [<f8fb7658>] gfs_trans_begin_i+0xfd/0x15a
[gfs]
Jun  9 19:28:06 jin kernel:  [<f8faadd2>] do_do_write_buf+0x268/0x3b4
[gfs]
Jun  9 19:28:06 jin kernel:  [<f8fab02e>] do_write_buf+0x110/0x152 [gfs]
Jun  9 19:28:06 jin kernel:  [<f8faa238>] walk_vm+0xd3/0xf7 [gfs]
Jun  9 19:28:06 jin kernel:  [<f8f9709a>] gfs_glock_dq+0x111/0x11f [gfs]
Jun  9 19:28:06 jin kernel:  [<f8fab10d>] gfs_write+0x9d/0xb6 [gfs]
Jun  9 19:28:06 jin kernel:  [<f8faaf1e>] do_write_buf+0x0/0x152 [gfs]
Jun  9 19:28:06 jin kernel:  [<f8fab070>] gfs_write+0x0/0xb6 [gfs]
Jun  9 19:28:06 jin kernel:  [<c0155ba8>] do_readv_writev+0x1c5/0x21d
Jun  9 19:28:06 jin kernel:  [<c0154c92>] dentry_open+0xf0/0x1a5
Jun  9 19:28:06 jin kernel:  [<c0155c7e>] vfs_writev+0x3e/0x43
Jun  9 19:28:06 jin kernel:  [<f8c11b6b>] nfsd_write+0xeb/0x289 [nfsd]
Jun  9 19:28:06 jin kernel:  [<f8b2d5db>] svcauth_unix_accept+0x2d3/0x34a
[sunrpc]
Jun  9 19:28:06 jin kernel:  [<f8c18356>] nfsd3_proc_write+0xbf/0xd5
[nfsd]
Jun  9 19:28:06 jin kernel:  [<f8c1a3a8>]
nfs3svc_decode_writeargs+0x0/0x243 [nfsd]
Jun  9 19:28:06 jin kernel:  [<f8c0e5d7>] nfsd_dispatch+0xba/0x16f [nfsd]
Jun  9 19:28:06 jin kernel:  [<f8b2a446>] svc_process+0x420/0x6d6 [sunrpc]
Jun  9 19:28:06 jin kernel:  [<f8c0e3b7>] nfsd+0x1cc/0x332 [nfsd]
Jun  9 19:28:06 jin kernel:  [<f8c0e1eb>] nfsd+0x0/0x332 [nfsd]
Jun  9 19:28:06 jin kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Jun  9 19:28:06 jin kernel: Trying to fix it up, but a reboot is needed
Jun  9 19:30:34 jin kernel: ------------[ cut here ]------------
Jun  9 19:30:34 jin kernel: kernel BUG at mm/vmscan.c:377!
Jun  9 19:30:34 jin kernel: invalid operand: 0000 [#1]
Jun  9 19:30:34 jin kernel: SMP
Jun  9 19:30:34 jin kernel: Modules linked in: lock_dlm(U) dlm(U) cman(U)
gfs(U) lock_harness(U) dm_mod qla2300 qla2xxx scsi_transport_fc nfsd
exportfs lockd autofs4 i2c_dev i2c_core md5 ipv6 sunrpc ipt_REJECT
ipt_state ip_conntrack iptable_filter ip_tables button battery ac uhci_hcd
ehci_hcd e1000 floppy ext3 jbd raid1 ata_piix libata sd_mod scsi_mod
Jun  9 19:30:34 jin kernel: CPU:    1
Jun  9 19:30:34 jin kernel: EIP:    0060:[<c01447bd>]    Tainted: GF   B
VLI
Jun  9 19:30:34 jin kernel: EFLAGS: 00010202   (2.6.9-5.0.5.ELsmp)
Jun  9 19:30:34 jin kernel: EIP is at shrink_list+0xa9/0x3ee
Jun  9 19:30:34 jin kernel: eax: 20001049   ebx: f7cedecc   ecx: c159f4f8
edx: c10f24d8
Jun  9 19:30:34 jin kernel: esi: c159f4e0   edi: 00000021   ebp: f7cedf58
esp: f7cede54
Jun  9 19:30:34 jin kernel: ds: 007b   es: 007b   ss: 0068
Jun  9 19:30:34 jin kernel: Process kswapd0 (pid: 44, threadinfo=f7ced000
task=f7d1b7b0)
Jun  9 19:30:34 jin kernel: Stack: 00000001 00000000 00000000 00000000
f7cedecc f7cede68 f7cede68 00000000
Jun  9 19:30:34 jin kernel:        00000001 c12f4be0 c1204a00 00000246
f7ceded4 c0319e00 00000000 f7ceded4
Jun  9 19:30:34 jin kernel:        c0143bc0 c10639f8 00000296 c1f479c0
c10639e0 00000000 00000020 f7ced000
Jun  9 19:30:34 jin kernel: Call Trace:
Jun  9 19:30:34 jin kernel:  [<c0143bc0>] __pagevec_release+0x15/0x1d
Jun  9 19:30:34 jin kernel:  [<c0144cdf>] shrink_cache+0x1dd/0x34d
Jun  9 19:30:34 jin kernel:  [<c014539d>] shrink_zone+0xa7/0xb6
Jun  9 19:30:34 jin kernel:  [<c0145740>] balance_pgdat+0x1b6/0x2f8
Jun  9 19:30:34 jin kernel:  [<c014594c>] kswapd+0xca/0xcc
Jun  9 19:30:34 jin kernel:  [<c011e8d2>]
autoremove_wake_function+0x0/0x2d
Jun  9 19:30:34 jin kernel:  [<c02c6206>] ret_from_fork+0x6/0x14
Jun  9 19:30:34 jin kernel:  [<c011e8d2>]
autoremove_wake_function+0x0/0x2d
Jun  9 19:30:34 jin kernel:  [<c0145882>] kswapd+0x0/0xcc
Jun  9 19:30:34 jin kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Jun  9 19:30:34 jin kernel: Code: 71 e8 89 50 04 89 02 c7 41 04 00 02 20
00 c7 01 00 01 10 00 f0 0f ba 69 e8 00 19 c0 85 c0 0f 85 b8 02 00 00 8b 41
e8 a8 40 74 08 <0f> 0b 79 01 41 9a 2d c0 8b 41 e8 f6 c4 20 0f 85 96 02 00
00 8b

Here is my cluster.conf:

<?xml version="1.0"?>
<cluster name="ftp" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="jin-p">
    <fence>
        <method name="single">
            <device name="sanbox2" port="1"/>
                </method>
    </fence>
</clusternode>

<clusternode name="mugen-p">
    <fence>
        <method name="single">
            <device name="sanbox1" port="1"/>
        </method>
    </fence>
</clusternode>

</clusternodes>

<fencedevices>
    <fencedevice name="sanbox1" agent="fence_sanbox2" ipaddr="10.0.19.30"
login="admin" passwd="p00-sm3llz"/>
    <fencedevice name="sanbox2" agent="fence_sanbox2" ipaddr="10.0.19.31"
login="admin" passwd="p00-sm3llz"/>
</fencedevices>

<fence_daemon post_join_delay="20">
</fence_daemon>

</cluster>
~