[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] fencing issue - with attach logs&conf



Hi .

I agree with you !! About giving the admin to adopt or not the
paranoid approach of not failing over the services.

I supported in the past tru64 clusters & now days the HP serviceguard.( hpux & linux ).

Hp decided not develops the serviceguard on  linux  anymore & we now start using Redhat-Cluster.

Its seems that for very critical customers you need at least 2 fencing method !!!

& there is another thing to be fix ASAP  - when using HALVM - The needs of comparing which file is newer , the lvm.conf or
the initrd.img. -

Regards.

Shalom.





On Fri, Mar 5, 2010 at 10:01 PM, brem belguebli <brem belguebli gmail com> wrote:
Corey,

Hi Corey

I was talking about a watchdog not a kernel panic (sysreq...), on
common (X86) hardware, most server vendors implement embedded hardware
chips that could be used.

Indeed, SCSI3 reservation/registration could be combined to this whole
stuff to be sure about the nodes sanity.

I think the choice should be given to the admin to adopt or not the
paranoid approach of not failing over the services.



2010/3/4 Corey Kovacs <corey kovacs gmail com>:
> Brem,
>
> It's been my understanding that the kernel panic technique you are
> describing essentially is undesirable for the fact that the kernel is in an
> unknown state. Basically anything can happen. The OS doesn't have to do a
> sync for an hba do flush etc. Since RedHat isn't in the business of building
> there own hardware like HP(DEC), Sun, IBM, they take the next best route to
> ensure that nothing from that problematic machine can affect the storage and
> the only way to guarantee that is to remove power from the whole machine.
>
> VMS and Tru64 use the panic method but the other nodes will issue a
> reservation on the scsi bus against that node to protect the storage. They
> can do that because they know exactly how there hardware and implementation
> of reservations work.
>
> Corey
>
> On Thu, Mar 4, 2010 at 5:32 AM, שלום קלמר <sklemer gmail com> wrote:
>>
>> Thanks to all !!!!
>>
>> Shalom klemer hp com
>>
>> On Thu, Mar 4, 2010 at 12:00 AM, Lon Hohberger <lhh redhat com> wrote:
>>>
>>> On Wed, 2010-03-03 at 13:10 +0200, שלום קלמר wrote:
>>> > Hi.
>>> >
>>> > I got 2 power supplies. But if someone by mistake pull the power
>>> > cables , is that mean
>>> >
>>> > That the services will not failover ??
>>>
>>> The problem is:
>>>
>>> no power = no ping + no DRAC access
>>> no network = no ping, no DRAC access
>>>
>>> If there's no power, then it is safe to fail over.
>>>
>>> If there is no network (and power is OK), then it is not safe to fail
>>> over.  Failover in this case is very likely to produce data corruption!
>>>
>>> Because we can not tell which case happened, we do not fail over.
>>>
>>> -- Lon
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster redhat com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster redhat com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]