[Linux-cluster] Split-brain with DRBD active-active + RHCS

Fri Mar 18 11:19:11 UTC 2011

Greetings,

On 3/15/11, jayesh.shinde <jayesh.shinde at netcore.co.in> wrote:
> Hi All ,
>
> I don't have SAN with me , so I want to build the 2 node DRBD active
> active for mysql & http  resource ( i.e /dev/drbd2 & /dev/drbd3 in my
> case) with RHCS .
>

Ok now you have the block device ready, say /dev/drbdx .

Wait, RHCS has not yet kicked in.

( If your storage is a iscsi target, I am not covering it )

Simply speaking, let us take on from here:

1. get the rhcs and clvmd working.

Now, let us carve out the slices of the juicy CLVM VG kulfi.

>
>  From last 1 week I am testing the same scenario in 2 XEN vms with
> kenel  2.6.18-128.el5xen ,
>

2. Allocate one LUN per VM enough local storage.

3. Test if your VMs live migrate successfully

> Every thing is working fine , like mysql and
> http services move from one server to other etc... But not working
> correctly when it get fence ( i.e when n/w fail on of the node).
>

==== Big IFF step three passes ok; i.e. your VMs are HA now

4. Let httpd run from within the VM which can have just simple
blistering fast native supported filesystem: If you are having RHEL 6,
that would be ext4. Of course you have the option of the true 64 bit
filesystem like xfs, btfs (technology previews) etc.

5. Prepare an SLA and availability guarantee statement from IT,
distribute it to everybody and their dogs and cats and cows too. Well
of course the CYA rule applies.

6. Run it and go home and sleep happily.

And oh if you are using HA at all, you must seriously cosider having a
DR and a monitoring system like Zabbix in place.

I have heard quite a few sob stories and couselled many during the
Risk Withdrawal Syndrome on learning that they were extremely
vulnerable but fortunately for them, the incidents did not cause
downtime in business.

Regards,

Rajagopal