[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS + DRBD Problems



As I thought, the problem I'm seeing is indeed rather multi-part. The first part is now resolved - large time-skips due to the system clock being out of date until ntpd syncs it up. It seems that large time jumps made dlm choke.

Now for part 2:

The two nodes connect - certainly enough to sync up DRBD. That stage goes through fine. They start cman and other cluster components, but it would appear then never actually find each other.

When mounting the shared file system:

Node 1:
GFS: fsid=sentinel:root.0: jid=0: Trying to acquire journal lock...
GFS: fsid=sentinel:root.0: jid=0: Looking at journal...
GFS: fsid=sentinel:root.0: jid=0: Acquiring the transaction lock...
GFS: fsid=sentinel:root.0: jid=0: Replaying journal...
GFS: fsid=sentinel:root.0: jid=0: Replayed 54 of 197 blocks
GFS: fsid=sentinel:root.0: jid=0: replays = 54, skips = 36, sames = 107
GFS: fsid=sentinel:root.0: jid=0: Journal replayed in 1s
GFS: fsid=sentinel:root.0: jid=0: Done
GFS: fsid=sentinel:root.0: jid=1: Trying to acquire journal lock...
GFS: fsid=sentinel:root.0: jid=1: Looking at journal...
GFS: fsid=sentinel:root.0: jid=1: Done
GFS: fsid=sentinel:root.0: Scanning for log elements...
GFS: fsid=sentinel:root.0: Found 0 unlinked inodes
GFS: fsid=sentinel:root.0: Found quota changes for 7 IDs
GFS: fsid=sentinel:root.0: Done


Node 2:
GFS: fsid=sentinel:root.0: jid=0: Trying to acquire journal lock...
GFS: fsid=sentinel:root.0: jid=0: Looking at journal...
GFS: fsid=sentinel:root.0: jid=0: Acquiring the transaction lock...
GFS: fsid=sentinel:root.0: jid=0: Replaying journal...
GFS: fsid=sentinel:root.0: jid=0: Replayed 6 of 6 blocks
GFS: fsid=sentinel:root.0: jid=0: replays = 6, skips = 0, sames = 0
GFS: fsid=sentinel:root.0: jid=0: Journal replayed in 1s
GFS: fsid=sentinel:root.0: jid=0: Done
GFS: fsid=sentinel:root.0: jid=1: Trying to acquire journal lock...
GFS: fsid=sentinel:root.0: jid=1: Looking at journal...
GFS: fsid=sentinel:root.0: jid=1: Done
GFS: fsid=sentinel:root.0: Scanning for log elements...
GFS: fsid=sentinel:root.0: Found 0 unlinked inodes
GFS: fsid=sentinel:root.0: Found quota changes for 2 IDs
GFS: fsid=sentinel:root.0: Done

Unless I'm reading this wrong, they are both trying to use JID 0.

The second node to join generally chokes at some point during the boot, but AFTER it mounted the GFS volume. On the booted node, cman_tool status says:

# cman_tool status
Version: 6.0.1
Config Version: 20
Cluster Name: sentinel
Cluster Id: 28150
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1
Active subsystems: 6
Flags: 2node
Ports Bound: 0
Node name: sentinel1c
Node ID: 1
Multicast addresses: 239.192.109.100
Node addresses: 10.0.0.1

So the second node never joined.
I know for a fact that the network connection between them is working, as they sync DRBD.

cluster.conf is here:

<?xml version="1.0"?>
<cluster config_version="20" name="sentinel">
        <cman two_node="1" expected_votes="1"/>
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="sentinel1c" nodeid="1" votes="1">
                        <com_info>
                                <rootsource name="drbd"/>
<!--<chrootenv mountpoint = "/var/comoonics/chroot"
                                                fstype          = "ext3"
device = "/dev/sda2" chrootdir = "/var/comoonics/chroot"
                                />-->
                                <syslog name="localhost"/>
<rootvolume name = "/dev/drbd1" mountopts = "defaults,noatime,nodiratime,noquota"
                                />
                                <eth    name    = "eth0"
                                        ip      = "10.0.0.1"
                                        mac     = "00:0B:DB:92:C5:E1"
                                        mask    = "255.255.255.0"
                                        gateway = ""
                                />
                                <fenceackserver user    = "root"
                                                passwd  = "password"
                                />
                        </com_info>
                        <fence>
                                <method name = "1">
                                        <device name = "sentinel1d"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="sentinel2c" nodeid="2" votes="1">
                        <com_info>
                                <rootsource name="drbd"/>
<!--<chrootenv mountpoint = "/var/comoonics/chroot"
                                                fstype          = "ext3"
device = "/dev/sda2" chrootdir = "/var/comoonics/chroot"
                                />-->
                                <syslog name="localhost"/>
<rootvolume name = "/dev/drbd1" mountopts = "defaults,noatime,nodiratime,noquota"
                                />
                                <eth    name    = "eth0"
                                        ip      = "10.0.0.2"
                                        mac     = "00:0B:DB:90:4E:1B"
                                        mask    = "255.255.255.0"
                                        gateway = ""
                                />
                                <fenceackserver user    = "root"
                                                passwd  = "password"
                                />
                        </com_info>
                        <fence>
                                <method name = "1">
                                        <device name = "sentinel2d"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
<fencedevice agent="fence_drac" ipaddr="192.168.254.252" login="root" name="sentinel1d" passwd="password"/> <fencedevice agent="fence_drac" ipaddr="192.168.254.253" login="root" name="sentinel2d" passwd="password"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

What could be causing the nodes to not join in the cluster?

Gordan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]