[Linux-cluster] clvmd problems with centos 6.3 or normal clvmd behaviour?

Thu Aug 2 14:17:51 UTC 2012

Yup, I missed the part where you said you only have a single node.

To be clear, the portion of the docs you site below is exactly why you need
to be careful about how many votes you give to the qdiskd. It should be a
tie breaker. You are using it to bring up a 3 node cluster in which only a
single node exists. This is file in a testing environment, but is not
recommended in a production setup. Once your other nodes are in place, you
won't need the qdiskd. If you decide to keep it around, be very careful
with it's use. It's really only meant for clusters in which you have an
even number of actual nodes.

Sorry i don't have more time this morning to look at this but I am sure
someone else will.

Take care

-C

On Thu, Aug 2, 2012 at 7:55 AM, Gianluca Cecchi
<gianluca.cecchi at gmail.com>wrote:

> On Thu, 2 Aug 2012 07:07:25 -0600 Corey Kovacs wrte:
> > I might be reading this wrong but just in case, I thought I'd point this
> out.
> >
> [snip]
> > A single node can maintain quorum since 2+3>(9/2).
> > In a split brain condition where a single node cannot talk to the other
> nodes, this could be disastrous.
>
> Thanks for your input, Corey.
> As I said before, at this moment I'll have only one node on a site so
> I'm also tweaking config to be able to work with one node alone
>
> Anyway I refer to this sentence in manual, also for more than two
> nodes configuration (example pertains to a 13 nodes cluster):
>
> "
> A cluster must maintain quorum to prevent split-brain issues. If
> quorum was not enforced, quorum, a communication error on that same
> thirteen-node cluster may cause a situation where six nodes are
> operating on the shared storage, while another six nodes are also
> operating on it, independently. Because of the communication error,
> the two partial-clusters would overwrite areas of the disk and corrupt
> the file system. With quorum rules enforced, only one of the partial
> clusters can use the shared storage, thus protecting data integrity.
> Quorum doesn't prevent split-brain situations, but it does decide who
> is dominant and allowed to function in the cluster. Should split-brain
> occur, quorum prevents more than one cluster group from doing
> anything.
> "
>
> This said, in my case my problem is not with quorum, that is gained
> when quorum disk becomes master, but with clvmd freezing without
> showing any error
> As suggested I set up logging for both cluster and lvm.
>
> I also configured lvmetad
>
> The diff between previous lvm.conf and current for further tests is this:
> # diff -u lvm.conf lvm.conf.pre020812
> --- lvm.conf    2012-08-02 14:48:31.172565731 +0200
> +++ lvm.conf.pre020812  2012-08-02 01:33:55.878511113 +0200
> @@ -232,8 +232,7 @@
>
>      # Controls the messages sent to stdout or stderr.
>      # There are three levels of verbosity, 3 being the most verbose.
> -    #verbose = 0
> -    verbose = 2
> +    verbose = 0
>
>      # Should we send log messages through syslog?
>      # 1 is yes; 0 is no.
> @@ -242,7 +241,6 @@
>      # Should we log error and debug messages to a file?
>      # By default there is no log file.
>      #file = "/var/log/lvm2.log"
> -    file = "/var/log/lvm2.log"
>
>      # Should we overwrite the log file each time the program is run?
>      # By default we append.
> @@ -251,8 +249,7 @@
>      # What level of log messages should we send to the log file and/or
> syslog?
>      # There are 6 syslog-like log levels currently in use - 2 to 7
> inclusive.
>      # 7 is the most verbose (LOG_DEBUG).
> -    #level = 0
> -    level = 4
> +    level = 0
>
>      # Format of output messages
>      # Whether or not (1 or 0) to indent messages according to their
> severity
> @@ -422,8 +419,7 @@
>      # Check whether CRC is matching when parsed VG is used multiple times.
>      # This is useful to catch unexpected internal cached volume group
>      # structure modification. Please only enable for debugging.
> -    #detect_internal_vg_cache_corruption = 0
> -    detect_internal_vg_cache_corruption = 1
> +    detect_internal_vg_cache_corruption = 0
>
>      # If set to 1, no operations that change on-disk metadata will be
> permitted.
>      # Additionally, read-only commands that encounter metadata in
> need of repair
> @@ -483,8 +479,7 @@
>      # libdevmapper.  Useful for debugging problems with activation.
>      # Some of the checks may be expensive, so it's best to use this
>      # only when there seems to be a problem.
> -    #checks = 0
> -    checks = 1
> +    checks = 0
>
>      # Set to 0 to disable udev synchronisation (if compiled into the
> binaries).
>      # Processes will not wait for notification from udev.
>
> cluster.conf changes
> # diff cluster.conf cluster.conf.51
> 2,6c2
> < <cluster config_version="52" name="clrhev">
> <       <dlm log_debug="1" plock_debug="1"/>
> <       <logging>
> <               <logging_daemon name="qdiskd" debug="on"/>
> <       </logging>
> ---
> > <cluster config_version="51" name="clrhev">
>
> In attach I send two files:
> lvm2.log with mark separating before and after issue of clvmd start command
> clvmd start output.txt that is the output during "service clvmd start"
> command
>
> to be able to do so, I started in signle user mode and then started
> the services one at a time as in
>
> /etc/rc.d/rc3.d/S*
>
> but anticipating the ssh daemon, so that I'm able to login remotely
> In fact after clvmd freezes I can only run a pair of sync commands and
> power off....
>
> If I'm not missing something stupid I can also post a bugzilla vs
> Centos Bug tracker and then eventually someone will report upstream if
> reproducible
>
> Gianluca
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120802/8230b5f6/attachment.htm>