[Linux-cluster] RE: 3 node cluster crashes

Nemeth, Norbert Norbert.Nemeth at mscibarra.com
Wed Aug 6 09:30:27 UTC 2008


Hi,

I have a problem with rgmanager's script resource.
My script uses $OCF_RESKEY_service_name in a following way:

<script file="/usr/local/sbin/cl2r.sh" name="script-VG" service_name="VOLUME GROUP">

It works on volume group: VOLUME GROUP defined in service_name.

If I have multiple services defined using the same script, I got:
clurgmgrd[10143]: <err> Unique attribute collision. type=script attr=file value=/usr/local/sbin/cl2r.sh

Checking /usr/share/cluster/script.sh I found:

        <parameter name="file" unique="1" required="1">
            <longdesc lang="en">
                Path to script
            </longdesc>
            <shortdesc lang="en">
                Path to script
            </shortdesc>
            <content type="string"/>
        </parameter>

Checking latest: (line 40)

http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=blob;f=rgmanager/src/resources/script.sh;h=41298115ccd39863f9f45d5f889e3b6299b3659d;hb=refs/heads/STABLE2#l40

Do you know why this file parameter for script resource has been set to unique?
May I ask to change it to unique="0"?

Best regards,
Norbert Németh

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Dalton, Maurice
Sent: Tuesday, August 05, 2008 11:56 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] 3 node cluster crashes


I have a 3 node cluster running cman-2.0.84-2.el5.  At times we have spanning tree events that cause network storms up to 9 seconds.
When these events  occur (today we caused them twice to verify this issue). All three nodes go down within seconds of this event.

The second time we tried it I added the totem token statement shown below. Same problem.





<cman>
                <multicast addr="225.0.0.11"/>
                <totem token="21000"/>
        </cman>



Aug  5 16:41:18 csarcsys2-eth0 ntpd[3484]: kernel time sync enabled 0001
Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] The token was lost in the OPERATIONAL state.
Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 2.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 0.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Creating commit token because I am the rep.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Saving state aru 46 high seq received 46
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Storing new sequence id for ring b50
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering COMMIT state.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering RECOVERY state.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] position [0] member 172.xx.xx.xxx:
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] previous ring seq 2892 rep 172.xx.xxx.xx
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] aru 46 high delivered 46 received flag 1
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Did not need to originate any messages in recovery.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Sending initial ORF token
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] CLM CONFIGURATION CHANGE
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] New Configuration:
Aug  5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 1
Aug  5 16:41:24 csarcsys2-eth0 clurgmgrd[3750]: <emerg> #1: Quorum Dissolved
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)
Aug  5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 3
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Left:
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Joined:
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CMAN ] quorum lost, blocking activity
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] CLM CONFIGURATION CHANGE
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] New Configuration:
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate.  Refusing connection.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Left:
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Joined:
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111).
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [SYNC ] This node is within the primary component and will provide service.
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Someone may be attempting something evil.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering OPERATIONAL state.
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing get: Invalid request descriptor
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] got nodejoin message 172.24.86.143
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate.  Refusing connection.
Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CPG  ] got joinlist message from node 2
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused
Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111).

________________________________
NOTICE: If received in error, please destroy and notify sender. Sender does not intend to waive confidentiality or privilege. Use of this email is prohibited when received in error.

Local registered entity: MSCI KFT
Metropolitan Court acting as the Court of Registry
Registered office: 1138 Budapest, Népfürdo utca 22, Hungary
Registration No. 01-09-885383
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080806/cd6c9d58/attachment.htm>


More information about the Linux-cluster mailing list