[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] fc6 two-node cluster with gfs2 not working



David Teigland wrote:
On Thu, Nov 02, 2006 at 03:58:41PM -0600, Greg Swift wrote:
Nov 1 22:49:07 box2 gfs_controld[3639]: mount: failed -17

[root goumang ~]# mount -v /mnt/data
/sbin/mount.gfs2: mount /dev/pri_outMail/pri_outMail_lv0 /mnt/data
/sbin/mount.gfs2: parse_opts: opts = "rw"
/sbin/mount.gfs2:   clear flag 1 for "rw", flags = 0
/sbin/mount.gfs2: parse_opts: flags = 0
/sbin/mount.gfs2: parse_opts: extra = ""
/sbin/mount.gfs2: parse_opts: hostdata = ""
/sbin/mount.gfs2: parse_opts: lockproto = ""
/sbin/mount.gfs2: parse_opts: locktable = ""
/sbin/mount.gfs2: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs2: write "join /mnt/data gfs2 lock_dlm outMail:data rw"
/sbin/mount.gfs2: setup_mount_error_fd 4 5
/sbin/mount.gfs2: message from gfs_controld: response to join request:
/sbin/mount.gfs2: lock_dlm_join: read "0"
/sbin/mount.gfs2: message from gfs_controld: mount options:
/sbin/mount.gfs2: lock_dlm_join: read "hostdata=jid=1:id=65538:first=0"
/sbin/mount.gfs2: lock_dlm_join: hostdata: "hostdata=jid=1:id=65538:first=0"
/sbin/mount.gfs2: lock_dlm_join: extra_plus: "hostdata=jid=1:id=65538:first=0"

All the cluster infrastructure appears to be working ok, and no more
gfs_controld error in the syslog again I'm assuming.  So, gfs on the
second node is either stuck doing i/o or it's stuck trying to get a dlm
lock.  A "ps ax -o pid,stat,cmd,wchan" might show what it's blocked on.
You might also try the same thing with gfs1 (would eliminate the dlm as
the problem).  It could also very well be a gfs2 or dlm bug that's been
fixed since the fc6 kernel froze -- we need to get some updates pushed
out.

Dave

Here is the output from the log file at the same time as what I included before.

Nov 2 15:51:33 goumang kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "outMail:data"
Nov 2 15:51:33 goumang kernel: dlm: data: recover 1
Nov 2 15:51:33 goumang kernel: GFS2: fsid=outMail:data.1: Joined cluster. Now mounting FS...
Nov 2 15:51:33 goumang kernel: dlm: data: add member 2
Nov 2 15:51:33 goumang kernel: dlm: Initiating association with node 2
Nov 2 15:51:33 goumang kernel: dlm: data: add member 1
Nov 2 15:51:33 goumang kernel: dlm: Error sending to node 2 -32

(sorry i pulled it offlist for a minute by not hitting reply all. i re-attached output for archival purposes)

--
http://www.gvtc.com
--
“While it is possible to change without improving, it is impossible to improve without changing.” -anonymous

“only he who attempts the absurd can achieve the impossible.” -anonymous

[root goumang ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
                                                           [  OK  ]
[root goumang ~]# service clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   /dev/sdb: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  1 logical volume(s) in volume group "pri_outMail" now active
                                                           [  OK  ]
[root goumang ~]# cman_tool status
Version: 6.0.1
Config Version: 2
Cluster Name: outMail
Cluster Id: 14026
Cluster Member: Yes
Cluster Generation: 1828
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 7
Flags: 2node
Ports Bound: 0 11
Node name: goumang.sgc
Node ID: 1
Multicast addresses: 239.192.54.1
Node addresses: 172.16.1.180
[root goumang ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M   1820   2006-11-02 15:50:24  goumang.sgc
   2   M   1828   2006-11-02 15:50:24  rushou.sgc
[root goumang ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010001 none
[1 2]
dlm              1     clvmd    00020001 none
[1 2]
[root goumang ~]# mount -v /mnt/data
/sbin/mount.gfs2: mount /dev/pri_outMail/pri_outMail_lv0 /mnt/data
/sbin/mount.gfs2: parse_opts: opts = "rw"
/sbin/mount.gfs2:   clear flag 1 for "rw", flags = 0
/sbin/mount.gfs2: parse_opts: flags = 0
/sbin/mount.gfs2: parse_opts: extra = ""
/sbin/mount.gfs2: parse_opts: hostdata = ""
/sbin/mount.gfs2: parse_opts: lockproto = ""
/sbin/mount.gfs2: parse_opts: locktable = ""
/sbin/mount.gfs2: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs2: write "join /mnt/data gfs2 lock_dlm outMail:data rw"
/sbin/mount.gfs2: setup_mount_error_fd 4 5
/sbin/mount.gfs2: message from gfs_controld: response to join request:
/sbin/mount.gfs2: lock_dlm_join: read "0"
/sbin/mount.gfs2: message from gfs_controld: mount options:
/sbin/mount.gfs2: lock_dlm_join: read "hostdata=jid=1:id=65538:first=0"
/sbin/mount.gfs2: lock_dlm_join: hostdata: "hostdata=jid=1:id=65538:first=0"
/sbin/mount.gfs2: lock_dlm_join: extra_plus: "hostdata=jid=1:id=65538:first=0"
[root rushou ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
                                                           [  OK  ]
[root rushou ~]# service clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   /dev/sdb: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdd: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdf: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdg: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  1 logical volume(s) in volume group "pri_outMail" now active
                                                           [  OK  ]
[root rushou ~]# cman_tool status
Version: 6.0.1
Config Version: 2
Cluster Name: outMail
Cluster Id: 14026
Cluster Member: Yes
Cluster Generation: 1828
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1
Active subsystems: 7
Flags: 2node
Ports Bound: 0 11
Node name: rushou.sgc
Node ID: 2
Multicast addresses: 239.192.54.1
Node addresses: 172.16.1.185
[root rushou ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M   1828   2006-11-02 15:50:25  goumang.sgc
   2   M   1824   2006-11-02 15:50:25  rushou.sgc
[root rushou ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010001 none
[1 2]
dlm              1     clvmd    00020001 none
[1 2]
[root rushou ~]# mount -v /mnt/data
/sbin/mount.gfs2: mount /dev/pri_outMail/pri_outMail_lv0 /mnt/data
/sbin/mount.gfs2: parse_opts: opts = "rw"
/sbin/mount.gfs2:   clear flag 1 for "rw", flags = 0
/sbin/mount.gfs2: parse_opts: flags = 0
/sbin/mount.gfs2: parse_opts: extra = ""
/sbin/mount.gfs2: parse_opts: hostdata = ""
/sbin/mount.gfs2: parse_opts: lockproto = ""
/sbin/mount.gfs2: parse_opts: locktable = ""
/sbin/mount.gfs2: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs2: write "join /mnt/data gfs2 lock_dlm outMail:data rw"
/sbin/mount.gfs2: setup_mount_error_fd 4 5
/sbin/mount.gfs2: message from gfs_controld: response to join request:
/sbin/mount.gfs2: lock_dlm_join: read "0"
/sbin/mount.gfs2: message from gfs_controld: mount options:
/sbin/mount.gfs2: lock_dlm_join: read "hostdata=jid=0:id=65538:first=1"
/sbin/mount.gfs2: lock_dlm_join: hostdata: "hostdata=jid=0:id=65538:first=1"
/sbin/mount.gfs2: lock_dlm_join: extra_plus: "hostdata=jid=0:id=65538:first=1"
/sbin/mount.gfs2: mount(2) ok
/sbin/mount.gfs2: read_proc_mounts: device = "/dev/pri_outMail/pri_outMail_lv0"
/sbin/mount.gfs2: read_proc_mounts: opts = "rw,hostdata=jid=0:id=65538:first=1"
[root rushou ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010001 none
[1 2]
dlm              1     clvmd    00020001 none
[1 2]
dlm              1     data     00020002 none
[1 2]
gfs              2     data     00010002 none
[1 2]
[root rushou ~]#

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]