[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] dlm-pcmk-3.0.17-1.fc14.x86_64 and gfs-pcmk-3.0.17-1.fc14.x86_64 woes



FYI, per:

> Cluster shutdown tips
> ---------------------
>
> * Avoiding a partly shutdown cluster due to lost quorum.
>
> There is a practical timing issue with respect to the shutdown steps being run
> on all nodes when shutting down an entire cluster (or most of it).  When
> shutting down the entire cluster (or shutting down a node for an extended
> period) use "cman_tool leave remove". This automatically reduces the number > of votes needed for quorum as each node leaves and prevents the loss of quorum
> which could keep the last nodes from cleanly completing shutdown.
>
> Using the "remove" leave option should not be used in general since it
> introduces potential split-brain risks.
>
> If the "remove" leave option is not used, quorum will be lost after enough > nodes have left the cluster. Once the cluster is inquorate, remaining members
> that have not yet completed "fence_tool leave" in the steps above will be
> stuck.  Operations such as umounting gfs or leaving the fence domain will
> block while the cluster is inquorate. They can continue and complete only
> when quorum is regained.
>
> If this happens, one option is to join the cluster ("cman_tool join") on some > of the nodes that have left so that the cluster regains quorum and the stuck > nodes can complete their shutdown. Another option is to forcibly reduce the > number of expected votes for the cluster which allows the cluster to become
> quorate again ("cman_tool expected <votes>").
>
> ...
>
> Two node clusters
> -----------------
>
> Ordinarily the loss of quorum after one node fails out of two will prevent the > remaining node from continuing (if both nodes have one vote.) Some special > configuration options can be set to allow the one remaining node to continue > operating if the other fails. To do this only two nodes with one vote each can > be defined in cluster.conf. The two_node and expected_votes values must then be
> set to 1 in the cman config section as follows.
>
>   <cman two_node="1" expected_votes="1">
>   </cman>
>

In http://sourceware.org/cluster/doc/usage.txt, it looks like example C.1 in http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#ap-cman should be changed to:

<?xml version="1.0"?>
<cluster config_version="1" name="beekhof">
  <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <clusternodes>
    <clusternode name="pcmk-1" nodeid="1">
      <fence/>
    </clusternode>
    <clusternode name="pcmk-2" nodeid="2">
      <fence/>
    </clusternode>
  </clusternodes>
  <cman two_node="1" expected_votes="1"/>
  <fencedevices/>
  <rm/>
</cluster>

gb

On 03/10/2011 09:52 AM, Gregory Bartholomew wrote:
On 03/10/2011 01:14 AM, Andrew Beekhof wrote:
On Wed, Mar 9, 2011 at 7:03 PM, Gregory Bartholomew
<gregory lee bartholomew gmail com> wrote:
Never mind, I figured it out ... I needed to install the gfs2-cluster
package and start its service and I also had a different name for my
cluster
in /etc/cluster/cluster.conf than what I was using in my mkfs.gfs2
command.

It's all working now. Thanks to those who helped me get this going,

So you're still using Pacemaker to mount/unmount the filesystem and
other services?
If so, were there any discrepancies in the documentation describing
how to configure this?

Good morning,

This is what I did to get the file system going:

-----

yum install -y httpd gfs2-cluster gfs2-utils
chkconfig gfs2-cluster on
service gfs2-cluster start

mkfs.gfs2 -p lock_dlm -j 2 -t siue-cs:iscsi /dev/sda1

cat <<-END | crm
configure primitive gfs ocf:heartbeat:Filesystem params
device="/dev/sda1" directory="/var/www/html" fstype="gfs2" op start
interval="0" timeout="60s" op stop interval="0" timeout="60s"
configure clone dual-gfs gfs
END

-----

I think this sed command was also missing from the guide:

sed -i '/^#<Location \/server-status>/,/#<\/Location>/{s/^#//;s/Allow
from .example.com/Allow from 127.0.0.1/}' /etc/httpd/conf/httpd.conf

I've attached the full record of all the commands that I used to set up
my nodes to this email. It has, at the end, the final result of "crm
configure show".

gb


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]