[Cluster-devel] ccs_config_validate in cluster 3.0.X

Fabio M. Di Nitto fdinitto at redhat.com
Wed Oct 28 10:36:30 UTC 2009


Hi everybody,

as briefly mentioned in 3.0.4 release note, a new system to validate the
configuration has been enabled in the code.

What it does
------------

The general idea is to be able to perform as many sanity checks on the
configuration as possible. This check allows us to spot the most common
mistakes, such as typos or possibly invalid values, in cluster.conf.


Configuring the validation
--------------------------

The validation system is integrated in several components.
It supports one config option that can take 3 values.

Via init script (or /etc/sysconfig/cman or distro equivalent):

CONFIG_VALIDATION=value

values can be:
1) FAIL - enables a very strict check. Even a simple typo will fail to
load the configuration.

2) WARN - the check is relaxed. Warnings are printed on the screen, but
the cluster will continue to load. (default)

3) NONE - disable the config validation system. (discouraged!)

this is equivalent to:
cman_tool join/version -D(FAIL|WARN|NONE)


What a user sees
----------------

The output of the validation process is very cryptic. Yes we are
absolutely aware of that and we are working on making it easy to
understand (if anybody has relax-ng experience, please contact us).

This is the typical output from a normal startup (configuration contains
no errors or warnings):

[root at fedora-rh-node1 ~]# /etc/init.d/cman start join
Starting cluster:
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Setting network parameters...                           [  OK  ]
   Starting cman...                                        [  OK  ]
[root at fedora-rh-node1 ~]#

This is the output with a typo in cluster.conf (running in WARN mode):

[root at fedora-rh-node1 ~]# /etc/init.d/cman start join
Starting cluster:
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Setting network parameters...                           [  OK  ]
   Starting cman... tempfile:22: element quorum: Relax-NG validity error
: Element cluster has extra content: quorum
Configuration fails to validate
                                                           [  OK  ]

The error in this specific case is that quorum element is wrong and
should be quorumd.. (for qdisk).

As you can see yourself, the output is not easy to understand without a
good understanding of Relax-NG.

The check also happens before configuration updates using via cman_tool
version. Here are 3 examples (i use -S to disable configuration
synchronization on my systems):

[root at fedora-rh-node1 ~]# cman_tool version -r 2 -S
[root at fedora-rh-node1 ~]#

cman_tool defaults to strict check, the same typo as above will abort
the configuration reload:

[root at fedora-rh-node4 ~]# cman_tool version -r 3 -S
tempfile:22: element quorum: Relax-NG validity error : Element cluster
has extra content: quorum
Configuration fails to validate
cman_tool: Not reloading, configuration is not valid

Disable the strict check and turn errors into warnings:

[root at fedora-rh-node1 ~]# cman_tool version -r 3 -S -DWARN
tempfile:22: element quorum: Relax-NG validity error : Element cluster
has extra content: quorum
Configuration fails to validate
[root at fedora-rh-node1 ~]#


What to do if there are errors
------------------------------

First of all do NOT panic.

This check integration is new and there might be several reasons why you
see a warning (including bugs in the validation schema).

Users with XML and Relax-NG experience should be able to sort it out simply.

For all the others we strongly recommend you to file a bug on
bugzilla.redhat.com, including /etc/cluster/cluster.conf _AND_
/usr/share/cluster/cluster.rng.

This will allow us to cross check bugs in our validation code/schema and
help users fixing their configuration files.


Using ccs_config_validate standalone command
--------------------------------------------

Validation of a configuration is an important step.

ccs_config_validate is a very powerful and flexible tool, but requires
understanding of the config subsystem to be used correctly.

The general/average user can simply invoke ccs_config_validate with no
options and will see the same results as when invoked via cman_tool.
This is achieved by loading the same environment variables as cman
init script and respecting those selections, it will perform the
required actions.

There are advanced use cases and usage of the tool, for example to
migrate from one config subsystem to another (cluster.conf to ldap for
example), but, generally, anyone who needs to do changes of this
magnitude is also expected to have a good understanding of the
configuration subsystem (a new document will be available shortly for
both developers and advanced users).

Please do not hesitate to ask for clarifications or report bugs.

Cheers
Fabio





More information about the Cluster-devel mailing list