[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] scsi reservation issue

Christopher Barry wrote:

Okay. I had some other issues to deal with, but now I'm back to this,
and let me get you all up to speed on what I have done, and what I do
not understand about all of this.

esx-01: contains nodes 1 thru 3
esx-02: contains nodes 4 thru 6

esx-01: all 3 cluster nodes can mount gfs.

esx-02: none can mount gfs.
esx-02: scsi reservation errors in dmesg
esx-02: mount fails w/ "can't read superblock"

OK. So it looks like one of the nodes is still holding a reservation on the device. First, we need to determine which node has that reservation. From any node, you should be able to run the following commands:

sg_persist -i -k /dev/sdc
sg_persist -i -r /dev/sdc

The first will list all the keys registered with the device. The second will show you which key is holding the reservation. At this point, I would expect that you will only see 1 key registered and that key will also be the reservation holder, but that it just a guess.

The keys are unique to each node, so we can figure correlate a key to a node. The key is just the hex representation of the node's IP address. You can get this by running gethostip -x <hostname>. By doing this, you should be able to figure out which node is still holding a reservation. Once you determine this key/node, try running /etc/init.d/scsi_reserve stop from that node. Once that runs, use the sg_persist commands listed above to see if the reservation is cleared.

Oddly, with the gfs filesystem unmounted on all nodes, I can format the
gfs filesystem from the esx-02 box (from node4), and then mount it from
a node on esx-01, but cannot mount it on the node I just formatted it

fdisk -l shows /dev/sdc1 on nodes 4 thru 6 just fine.

Hmm. I wonder if there is something goofy happening because the nodes are running within vmware. I have never tried this, so I have no idea. Either way, we should be able to clear up the problem.

# sg_persist -C --out /dev/sdc1
fails to clear out the reservations

Right. It believe this must be run from the node holding the reservation, or at the very least a node that is registered with the device. Also node that scsi reservations effect the entire LUN, so you can't issue registrations/reservations to a single partition (ie. sdc1).

I do not understand these reservations, maybe someone can summarize?

I'll try to be brief. Each node in the cluster can register with a device, thus a device may have many registrations. Each node registers by using a unique key. Once registered, one of the nodes can issue a reservation. Only one node may hold the reservation, the reservations is created using that node's key. For our purposed, we use a write-exclusive, registrants only type of reservation. This means that only nodes that are registered with the device may write to it. As long as that reservation exists, that rule will be enforced.

When it comes to to remove registrations, there it one caveat: the node that hold the reservation cannot unregister unless there are no other nodes registered with the device. This is due to the fact that the reservations holder must also be registered *and* if the reservation were to go away the write-exclusive, registrants-only policy would not longer be in effect. So ... what may have happened is that you tried to clear the reservation while other nodes were still registered, which will fail since that cannot happen. Once all the other nodes have "unregistered", you should be able to go back and clear the reservation.

Yes, this is a limitation in our product. There is a notion of moving a reservation (in the case where the reservation holder wants to unregister), but that is not yet implemented.

I'm not at the box this sec (vpn-ing in will hork my evolution), but I
will provide any amount of data if either you Ryan, or anyone else has
stuff for me to try.

Please let me know if you have questions or need further assistance clearing that pesky reservation for you. :)

Thanks all,

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]