[Linux-cluster] Starter Cluster / GFS

Thu Nov 11 16:48:50 UTC 2010

On 10-11-11 04:04 AM, Gordan Bobic wrote:
> Digimer wrote:
>> On 10-11-10 11:09 AM, Gordan Bobic wrote:
>>> Digimer wrote:
>>>> On 10-11-10 07:17 AM, Gordan Bobic wrote:
>>>>>>> If you want the FS mounted on all nodes at the same time then all
>>>>>>> those nodes must be a part of the cluster, and they have to be
>>>>>>> quorate (majority of nodes have to be up). You don't need a quorum
>>>>>>> block device, but it can be useful when you have only 2 nodes.
>>>>>> At term, I will have 7 to 10 nodes, but 2 at first for initial setup
>>>>>> and testing. Ok, so if I have a 3 nodes cluster for exemple, I
>>>>>> need at
>>>>>> least 2 nodes for the cluster, and thus the gfs, to be up ? I cannot
>>>>>> have a running gfs with only one node ?
>>>>> In a 2-node cluster, you can have running GFS with just one node
>>>>> up. But
>>>>> in that case it is advisble to have a quorum block device on the SAN.
>>>>> With a 3 node cluster, you cannot have quorum with just 1 node, and
>>>>> thus
>>>>> you cannot have GFS running. It will block until quorum is
>>>>> re-established.
>>>> With a quorum disk, you can in fact have one node left and still have
>>>> quorum. This is because the quorum drive should have (node-1) votes,
>>>> thus always giving the last node 50%+1 even with all other nodes being
>>>> dead.
>>> I've never tried testing that use-case extensively, but I suspect that
>>> it is only safe to do with SAN-side fencing. Otherwise two nodes could
>>> lose contact with each other and still both have access to the SAN and
>>> thus both be individually quorate.
>>>
>>> Gordan
>>
>> Clustered storage *requires* fencing. To not use fencing is like driving
>> tired; It's just a matter of time before something bad happens. That
>> said, I should have been more clear in specifying the requirement for
>> fencing.
>>
>> Now that said, the fencing shouldn't be needed at the SAN side, though
>> that works fine as well.
> 
> The default fencing action, last time I checked, is reboot. Consider the
> use case where you have a network failure and separate networks for
> various things, and you lose connectivity between the nodes but they
> both still have access to the SAN. One node gets fenced, reboots, comes
> up and connects to the SAN. It connects to the quorum device and has
> quorum without the other nodes, and mounts the file systems and starts
> writing - while all the other nodes that have become partitioned off do
> the same thing. Unless you can fence the nodes from the SAN side, quorum
> device having a 50% weight is a recipe for disaster.

Agreed, and that is one of the major benefits of qdisk. It prevents a
50/50 split. Regardless though, say you have an eight node cluster and
it partitions evenly with no qdisk to tie break. In that case, neither
partition has >50% of the votes, so neither should have quorum. In turn,
neither should touch the SAN.

This is because DLM is required for clustered file systems, and DLM in
turn requires quorum. Without quorum, DLM won't run and you will not be
able to touch the SAN. :)

>> The way it works is:
> [...]
> 
> I'm well aware of how fencing works, but you overlooked one major
> failure mode that is essentially guaranteed to hose your data if you set
> up the quorum device to have 50% of the votes.

See above. 50% is not quorum.

>> With SAN-side fencing, a fence is in the form of a logic disconnection
>> from the storage network. This has no inherent mechanism for recovery,
>> so the sysadmin will have to manually recover the node(s). For this
>> reason, I do not prefer it.
> 
> Then don't use a quorum device with more than an equal weight to the
> individual nodes.
> 
> Gordan

How does the number of nodes relate, in this case, to the SAN-side fence
recovery?

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org