[Cluster-devel] conga/luci/docs config_rhel5.html

Fri Nov 3 20:34:32 UTC 2006

CVSROOT:	/cvs/cluster
Module name:	conga
Changes by:	rmccabe at sourceware.org	2006-11-03 20:34:32

Added files:
	luci/docs      : config_rhel5.html 

Log message:
	add the config help file here, too, to be stuffed into the server database

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/conga/luci/docs/config_rhel5.html.diff?cvsroot=cluster&r1=NONE&r2=1.1

/cvs/cluster/conga/luci/docs/config_rhel5.html,v  -->  standard output
revision 1.1

--- conga/luci/docs/config_rhel5.html
+++ -	2006-11-03 20:34:32.437401000 +0000
@@ -0,0 +1,260 @@
+<html><head><title>Advanced Cluster Configuration Parameters</title>
+</head><body>
+<h2>Advanced Cluster Configuration Parameters</h2>
+<p>
+<dl compact>
+<dt><a name="secauth"><strong>secauth</strong></a><dd>
+This specifies that HMAC/SHA1 authentication should be used to authenticate
+all messages.  It further specifies that all data should be encrypted with the
+sober128 encryption algorithm to protect data from eavesdropping.
+<p>
+Enabling this option adds a 36 byte header to every message sent by totem which
+reduces total throughput.  Encryption and authentication consume 75% of CPU
+cycles in aisexec as measured with gprof when enabled.
+<p>
+For 100mbit networks with 1500 MTU frame transmissions:
+A throughput of 9mb/sec is possible with 100% cpu utilization when this
+option is enabled on 3ghz cpus.
+A throughput of 10mb/sec is possible wth 20% cpu utilization when this
+optin is disabled on 3ghz cpus.
+<p>
+For gig-e networks with large frame transmissions:
+A throughput of 20mb/sec is possible when this option is enabled on
+3ghz cpus.
+A throughput of 60mb/sec is possible when this option is disabled on
+3ghz cpus.
+<p>
+The default is on.
+<p>
+<dt><a name="rrp_mode"><strong>rrp_mode</strong></a><dd>
+This specifies the mode of redundant ring, which may be none, active, or
+passive.  Active replication offers slightly lower latency from transmit
+to delivery in faulty network environments but with less performance.
+Passive replication may nearly double the speed of the totem protocol
+if the protocol doesn't become cpu bound.  The final option is none, in
+which case only one network interface will be used to operate the totem
+protocol.
+<p>
+If only one interface directive is specified, none is automatically chosen.
+If multiple interface directives are specified, only active or passive may
+be chosen.
+<p>
+<dt><a name="netmtu"><strong>netmtu</strong></a><dd>
+This specifies the network maximum transmit unit.  To set this value beyond
+1500, the regular frame MTU, requires ethernet devices that support large, or
+also called jumbo, frames.  If any device in the network doesn't support large
+frames, the protocol will not operate properly.  The hosts must also have their
+mtu size set from 1500 to whatever frame size is specified here.
+<p>
+Please note while some NICs or switches claim large frame support, they support
+9000 MTU as the maximum frame size including the IP header.  Setting the netmtu
+and host MTUs to 9000 will cause totem to use the full 9000 bytes of the frame.
+Then Linux will add a 18 byte header moving the full frame size to 9018.  As a
+result some hardware will not operate properly with this size of data.  A netmtu 
+of 8982 seems to work for the few large frame devices that have been tested.
+Some manufacturers claim large frame support when in fact they support frame
+sizes of 4500 bytes.
+<p>
+Increasing the MTU from 1500 to 8982 doubles throughput performance from 30MB/sec
+to 60MB/sec as measured with evsbench with 175000 byte messages with the secauth 
+directive set to off.
+<p>
+When sending multicast traffic, if the network frequently reconfigures, chances are
+that some device in the network doesn't support large frames.
+<p>
+Choose hardware carefully if intending to use large frame support.
+<p>
+The default is 1500.
+<p>
+<dt><a name="threads"><strong>threads</strong></a><dd>
+This directive controls how many threads are used to encrypt and send multicast
+messages.  If secauth is off, the protocol will never use threaded sending.
+If secauth is on, this directive allows systems to be configured to use
+multiple threads to encrypt and send multicast messages.
+<p>
+A thread directive of 0 indicates that no threaded send should be used.  This
+mode offers best performance for non-SMP systems. 
+<p>
+The default is 0.
+<p>
+<dt><a name="vsftype"><strong>vsftype</strong></a><dd>
+This directive controls the virtual synchrony filter type used to identify
+a primary component.  The preferred choice is YKD dynamic linear voting,
+however, for clusters larger then 32 nodes YKD consumes alot of memory.  For
+large scale clusters that are created by changing the MAX_PROCESSORS_COUNT 
+#define in the C code totem.h file, the virtual synchrony filter "none" is
+recommended but then AMF and DLCK services (which are currently experimental)
+are not safe for use.
+<p>
+The default is ykd.  The vsftype can also be set to none.
+<p>
+Within the 
+<B>totem </B>
+
+directive, there are several configuration options which are used to control
+the operation of the protocol.  It is generally not recommended to change any
+of these values without proper guidance and sufficient testing.  Some networks
+may require larger values if suffering from frequent reconfigurations.  Some
+applications may require faster failure detection times which can be achieved
+by reducing the token timeout.
+<p>
+<dt><a name="token"><strong>token</strong></a><dd>
+This timeout specifies in milliseconds until a token loss is declared after not
+receiving a token.  This is the time spent detecting a failure of a processor
+in the current configuration.  Reforming a new configuration takes about 50
+milliseconds in addition to this timeout.
+<p>
+The default is 5000 milliseconds.
+<p>
+<dt><a name="token_retransmit"><strong>token_retransmit</strong></a><dd>
+This timeout specifies in milliseconds after how long before receiving a token
+the token is retransmitted.  This will be automatically calculated if token
+is modified.  It is not recommended to alter this value without guidance from
+the openais community.
+<p>
+The default is 238 milliseconds.
+<p>
+<dt><a name="hold"><strong>hold</strong></a><dd>
+This timeout specifies in milliseconds how long the token should be held by
+the representative when the protocol is under low utilization.   It is not
+recommended to alter this value without guidance from the openais community.
+<p>
+The default is 180 milliseconds.
+<p>
+<dt><a name="retransmits_before_loss"><strong>retransmits_before_loss</strong></a><dd>
+This value identifies how many token retransmits should be attempted before
+forming a new configuration.  If this value is set, retransmit and hold will
+be automatically calculated from retransmits_before_loss and token.
+<p>
+The default is 4 retransmissions.
+<p>
+<dt><a name="join"><strong>join</strong></a><dd>
+This timeout specifies in milliseconds how long to wait for join messages in 
+the membership protocol.
+<p>
+The default is 100 milliseconds.
+<p>
+<dt><a name="send_join"><strong>send_join</strong></a><dd>
+This timeout specifies in milliseconds an upper range between 0 and send_join
+to wait before sending a join message.  For configurations with less then
+32 nodes, this parameter is not necessary.  For larger rings, this parameter
+is necessary to ensure the NIC is not overflowed with join messages on
+formation of a new ring.  A reasonable value for large rings (128 nodes) would
+be 80msec.  Other timer values must also change if this value is changed.  Seek
+advice from the openais mailing list if trying to run larger configurations.
+<p>
+The default is 0 milliseconds.
+<p>
+<dt><a name="consensus"><strong>consensus</strong></a><dd>
+This timeout specifies in milliseconds how long to wait for consensus to be
+achieved before starting a new round of membership configuration.
+<p>
+The default is 200 milliseconds.
+<p>
+<dt><a name="merge"><strong>merge</strong></a><dd>
+This timeout specifies in milliseconds how long to wait before checking for
+a partition when no multicast traffic is being sent.  If multicast traffic
+is being sent, the merge detection happens automatically as a function of
+the protocol.
+<p>
+The default is 200 milliseconds.
+<p>
+<dt><a name="downcheck"><strong>downcheck</strong></a><dd>
+This timeout specifies in milliseconds how long to wait before checking
+that a network interface is back up after it has been downed.
+<p>
+The default is 1000 millseconds.
+<p>
+<dt><a name="fail_to_recv_const"><strong>fail_to_recv_const</strong></a><dd>
+This constant specifies how many rotations of the token without receiving any
+of the messages when messages should be received may occur before a new
+configuration is formed.
+<p>
+The default is 50 failures to receive a message.
+<p>
+<dt><a name="seqno_unchanged_const"><strong>seqno_unchanged_const</strong></a><dd>
+This constant specifies how many rotations of the token without any multicast
+traffic should occur before the merge detection timeout is started.
+<p>
+The default is 30 rotations.
+<p>
+<dt><a name="heartbeat_failures_allowed"><strong>heartbeat_failures_allowed</strong></a><dd>
+[HeartBeating mechanism]
+Configures the optional HeartBeating mechanism for faster failure detection. Keep in
+mind that engaging this mechanism in lossy networks could cause faulty loss declaration 
+as the mechanism relies on the network for heartbeating. 
+<p>
+So as a rule of thumb use this mechanism if you require improved failure in low to 
+medium utilized networks.
+<p>
+This constant specifies the number of heartbeat failures the system should tolerate
+before declaring heartbeat failure e.g 3. Also if this value is not set or is 0 then the
+heartbeat mechanism is not engaged in the system and token rotation is the method
+of failure detection
+<p>
+The default is 0 (disabled).
+<p>
+<dt><a name="max_network_delay"><strong>max_network_delay</strong></a><dd>
+[HeartBeating mechanism]
+This constant specifies in milliseconds the approximate delay that your network takes
+to transport one packet from one machine to another. This value is to be set by system
+engineers and please dont change if not sure as this effects the failure detection
+mechanism using heartbeat.
+<p>
+The default is 50 milliseconds.
+<p>
+<dt><a name="window_size"><strong>window_size</strong></a><dd>
+This constant specifies the maximum number of messages that may be sent on one
+token rotation.  If all processors perform equally well, this value could be
+large (300), which would introduce higher latency from origination to delivery
+for very large rings.  To reduce latency in large rings(16+), the defaults are
+a safe compromise.  If 1 or more slow processor(s) are present among fast
+processors, window_size should be no larger then 256000 / netmtu to avoid
+overflow of the kernel receive buffers.  The user is notified of this by
+the display of a retransmit list in the notification logs.  There is no loss
+of data, but performance is reduced when these errors occur.
+<p>
+The default is 50 messages.
+<p>
+<dt><a name="max_messages"><strong>max_messages</strong></a><dd>
+This constant specifies the maximum number of messages that may be sent by one
+processor on receipt of the token.  The max_messages parameter is limited to
+256000 / netmtu to prevent overflow of the kernel transmit buffers.
+<p>
+The default is 17 messages.
+<p>
+<dt><a name="rrp_problem_count_timeout"><strong>rrp_problem_count_timeout</strong></a><dd>
+This specifies the time in milliseconds to wait before decrementing the
+problem count by 1 for a particular ring to ensure a link is not marked
+faulty for transient network failures.
+<p>
+The default is 1000 milliseconds.
+<p>
+<dt><a name="rrp_problem_count_threshold"><strong>rrp_problem_count_threshold</strong></a><dd>
+This specifies the number of times a problem is detected with a link before
+setting the link faulty.  Once a link is set faulty, no more data is
+transmitted upon it.  Also, the problem counter is no longer decremented when
+the problem count timeout expires.
+<p>
+A problem is detected whenever all tokens from the proceeding processor have
+not been received within the rrp_token_expired_timeout.  The
+rrp_problem_count_threshold * rrp_token_expired_timeout should be atleast 50
+milliseconds less then the token timeout, or a complete reconfiguration
+may occur.
+<p>
+The default is 20 problem counts.
+<p>
+<dt><a name="rrp_token_expired_timeout"><strong>rrp_token_expired_timeout</strong></a><dd>
+This specifies the time in milliseconds to increment the problem counter for
+the redundant ring protocol after not having received a token from all rings
+for a particular processor.
+<p>
+This value will automatically be calculated from the token timeout and
+problem_count_threshold but may be overridden.  It is not recommended to
+override this value without guidance from the openais community.
+<p>
+The default is 47 milliseconds.
+<p>
+</dl>
+</body>
+</html>