[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Cluster Crashes



isplist logicore net wrote:
>
First of all, is there a way I can test to see if my Brocade switch is actually doing any fencing or not? I get the sense it's doing nothing.

I think this because my cluster is terribly unstable. If I reboot a node, that's fine, it works, the cluster stays up. However, if one of the nodes crashes in any manner, it takes down everything to the point of having to shut down every machine and starting it all one at a time.

If a drive get's moved on my FC storage, the cluster crashes. If the storage is rebooted, the cluster crashes. If I change pretty much anything on the storage, the cluster crashes, it's nuts. The way it seems to start is that one node seems to have a kernel panic which sets off the rest.

I know this is limited information but I need somewhere to start. I can't even begin to think of using this in a production environment, no one would get any sleep watching over this to make sure it's all up :).

Mike

This almost sounds like the RSCN problem I tried to chase down a while back. In a nutshell, something changes on the SAN and an RSCN event occurs, which is seen by all nodes on the SAN. The RSCN event should be completely harmless, but I have seen it kill all the FC I/O paths, and that would be bad. I would think that the cluster would stay up, but nodes would withdraw from the filesystem as soon as they lost the I/O path.

Are you using Qlogic HBAs? If so, check /var/log/messages for any "SCSI errors".

What you are seeing could be unrelated, but the symptoms sounds roughly the same.

Ryan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]