[Linux-cluster] problems with clvmd and lvms on rhel6.1

Fri Aug 10 20:16:00 UTC 2012

Yeah, Thanks. I checked your thread...if you  ment "clvmd hangs" however
It's like not finished... I see only 3 entries for that thread and
unfortunately no solution at the end. May I miss something?
However my scenario is a bit different, I don't need gfs, but only clvmd
with a failover lvm, as this is an active/passive configuration. And my
clvmd is rarely hanging, but my main problem that all the volumes remain
inactive.

On 08/10/2012 07:00 PM, Chip Burke wrote:
> See my thread earlier as I am having similar issues. I am testing this
> soon, but I "think" the issue in my case is setting up SCSI fencing before
> GFS2. So essentially it has nothing to fence off of, sees it as a fault,
> and never recovers. I "think" my fix will be establish the LVMs, GFS2 etc
> then put in the SCSI fence so that it can actually create the private
> reservations. Then the fun begins in pulling the plug randomly to see how
> it behaves.
> ________________________________________
> Chip Burke
> 
> 
> 
> 
> 
> 
> 
> On 8/10/12 12:46 PM, "Digimer" <lists at alteeve.ca> wrote:
> 
>> Not sure if it relates, but I can say that without fencing, things will
>> break in strange ways. The reason is that if anything triggers a fault,
>> the cluster blocks by design and stays blocked until a fence call
>> succeeds (which is impossible without fencing configured in the first
>> place).
>>
>> Can you please setup fencing, test to make sure it works (using
>> 'fence_node rhel2.local' from rhel1.local, then in reverse)? Once this
>> is done, test again for your problem. If it still exists, please paste
>> the updated cluster.conf then. Also please include syslog from both
>> nodes around the time of your LVM tests.
>>
>> digimer
>>
>> On 08/10/2012 12:38 PM, Poós Krisztián wrote:
>>> This is the cluster conf, Which is a clone of the problematic system on
>>> a test environment (without the ORacle and SAP instances, only focusing
>>> on this LVM issue, with an LVM resource)
>>>
>>> [root at rhel2 ~]# cat /etc/cluster/cluster.conf
>>> <?xml version="1.0"?>
>>> <cluster config_version="7" name="teszt">
>>> 	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>> 	<clusternodes>
>>> 		<clusternode name="rhel1.local" nodeid="1" votes="1">
>>> 			<fence/>
>>> 		</clusternode>
>>> 		<clusternode name="rhel2.local" nodeid="2" votes="1">
>>> 			<fence/>
>>> 		</clusternode>
>>> 	</clusternodes>
>>> 	<cman expected_votes="3"/>
>>> 	<fencedevices/>
>>> 	<rm>
>>> 		<failoverdomains>
>>> 			<failoverdomain name="all" nofailback="1" ordered="1" restricted="0">
>>> 				<failoverdomainnode name="rhel1.local" priority="1"/>
>>> 				<failoverdomainnode name="rhel2.local" priority="2"/>
>>> 			</failoverdomain>
>>> 		</failoverdomains>
>>> 		<resources>
>>> 			<lvm lv_name="teszt-lv" name="teszt-lv" vg_name="teszt"/>
>>> 			<fs device="/dev/teszt/teszt-lv" fsid="43679" fstype="ext4"
>>> mountpoint="/lvm" name="teszt-fs"/>
>>> 		</resources>
>>> 		<service autostart="1" domain="all" exclusive="0" name="teszt"
>>> recovery="disable">
>>> 			<lvm ref="teszt-lv"/>
>>> 			<fs ref="teszt-fs"/>
>>> 		</service>
>>> 	</rm>
>>> 	<quorumd label="qdisk"/>
>>> </cluster>
>>>
>>> Here are the log parts:
>>> Aug 10 17:21:21 rgmanager I am node #2
>>> Aug 10 17:21:22 rgmanager Resource Group Manager Starting
>>> Aug 10 17:21:22 rgmanager Loading Service Data
>>> Aug 10 17:21:29 rgmanager Initializing Services
>>> Aug 10 17:21:31 rgmanager /dev/dm-2 is not mounted
>>> Aug 10 17:21:31 rgmanager Services Initialized
>>> Aug 10 17:21:31 rgmanager State change: Local UP
>>> Aug 10 17:21:31 rgmanager State change: rhel1.local UP
>>> Aug 10 17:23:23 rgmanager Starting stopped service service:teszt
>>> Aug 10 17:23:25 rgmanager Failed to activate logical volume,
>>> teszt/teszt-lv
>>> Aug 10 17:23:25 rgmanager Attempting cleanup of teszt/teszt-lv
>>> Aug 10 17:23:29 rgmanager Failed second attempt to activate
>>> teszt/teszt-lv
>>> Aug 10 17:23:29 rgmanager start on lvm "teszt-lv" returned 1 (generic
>>> error)
>>> Aug 10 17:23:29 rgmanager #68: Failed to start service:teszt; return
>>> value: 1
>>> Aug 10 17:23:29 rgmanager Stopping service service:teszt
>>> Aug 10 17:23:30 rgmanager stop: Could not match /dev/teszt/teszt-lv with
>>> a real device
>>> Aug 10 17:23:30 rgmanager stop on fs "teszt-fs" returned 2 (invalid
>>> argument(s))
>>> Aug 10 17:23:31 rgmanager #12: RG service:teszt failed to stop;
>>> intervention required
>>> Aug 10 17:23:31 rgmanager Service service:teszt is failed
>>> Aug 10 17:24:09 rgmanager #43: Service service:teszt has failed; can not
>>> start.
>>> Aug 10 17:24:09 rgmanager #13: Service service:teszt failed to stop
>>> cleanly
>>> Aug 10 17:25:12 rgmanager Starting stopped service service:teszt
>>> Aug 10 17:25:14 rgmanager Failed to activate logical volume,
>>> teszt/teszt-lv
>>> Aug 10 17:25:15 rgmanager Attempting cleanup of teszt/teszt-lv
>>> Aug 10 17:25:17 rgmanager Failed second attempt to activate
>>> teszt/teszt-lv
>>> Aug 10 17:25:18 rgmanager start on lvm "teszt-lv" returned 1 (generic
>>> error)
>>> Aug 10 17:25:18 rgmanager #68: Failed to start service:teszt; return
>>> value: 1
>>> Aug 10 17:25:18 rgmanager Stopping service service:teszt
>>> Aug 10 17:25:19 rgmanager stop: Could not match /dev/teszt/teszt-lv with
>>> a real device
>>> Aug 10 17:25:19 rgmanager stop on fs "teszt-fs" returned 2 (invalid
>>> argument(s))
>>>
>>>
>>> After I manually started the lvm on node1 and tried to switch it on
>>> node2 it's not able to start it.
>>>
>>> Regards,
>>> Krisztian
>>>
>>>
>>> On 08/10/2012 05:15 PM, Digimer wrote:
>>>> On 08/10/2012 11:07 AM, Poós Krisztián wrote:
>>>>> Dear all,
>>>>>
>>>>> I hope that anyone run into this problem in the past, so maybe can
>>>>> help
>>>>> me resolving this issue.
>>>>>
>>>>> There is a 2 node rhel cluster with quorum also.
>>>>> There are clustered lvms, where the -c- flag is on.
>>>>> If I start clvmd all the clustered lvms became online.
>>>>>
>>>>> After this if I start rgmanager, it deactivates all the volumes, and
>>>>> not
>>>>> able to activate them anymore as there are no such devices anymore
>>>>> during the startup of the service, so after this, the service fails.
>>>>> All lvs remain without the active flag.
>>>>>
>>>>> I can manually bring it up, but only if after clvmd is started, I set
>>>>> the lvms manually offline by the lvchange -an <lv>
>>>>> After this, when I start rgmanager, it can take it online without
>>>>> problems. However I think, this action should be done by the rgmanager
>>>>> itself. All the logs is full with the next:
>>>>> rgmanager Making resilient: lvchange -an ....
>>>>> rgmanager lv_exec_resilient failed
>>>>> rgmanager lv_activate_resilient stop failed on ....
>>>>>
>>>>> As well, sometimes the lvs/clvmd commands are also hanging. I have to
>>>>> restart clvmd to make it work again. (sometimes killing it)
>>>>>
>>>>> Anyone has any idea, what to check?
>>>>>
>>>>> Thanks and regards,
>>>>> Krisztian
>>>>
>>>> Please paste your cluster.conf file with minimal edits.
>>
>>
>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.com
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4925 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120810/65bfdfed/attachment.p7s>