[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Cluster of different Architectures

> The cluster framework /should/ work with different architectures. I've tested it
> with intel,sparc & alpha boxes - but not much!
I've tried it today, and managed to kill the cluster quite spectacularly.  The sun node joins, and gets counted as a vote, but in the "Quorum" line of /proc/cluster/status, there's a disagreement between the sun node and the x86 nodes.  The x86 are running, with a vote count of 3, and the sun box is "frozen" or something, with a quorum count of 0.  Attempting to leave the cluster (for any node) resulted in a failure code, and the load average on the first node skyrocketed (12 before it stopped being usable) - dlm was taking 100% CPU, and I got a kernel oops too.

To compile it, I had to edit the source in one place (two casts), and had to play with debian's version of gcc to get it to compile 64 bit code that worked (since the majority of libraries are 32 bit only), I ended up compiling up libxml2 myself

If you've had it working in the past, I'll play some more - but at the moment the 2 x86 boxes are live, so I can't break them too often *grin*

> I don't know about GFS itself though, I don't have shared storage at home :)
GNBD means you don't need it! :)

I've been trying to work out whether GNBD on top of DRBD could be used to make HA "virtual" shared storage - but I've not used GNBD (I have an unjustified dislike of "fake" shared storage) so I don't know how well it would work, and of course some sort of failover would be required too - again I don't know how well that would work with GNBD. And of course you've got the problem of bringing the systems back in line in the right order in case you have multiple failures causing a full collapse of the system.

If I get stuck with this sparc attaching to the x86 pair, I'll see if I can get the ultra10 working again and look at GNBD then.

I probably should write down what I'm doing - for my benefit if no-one else's, I've had to puzzle out how to setup/configure stuff multiple times when I've rebuilt already...


PS. In case anyone is interested I've attached the dump from the primary node that I got after trying to join the sparc(wirenth) to the existing cluster (ramoth & mnementh), and then get it to leave again.  Apologies for the lack of imagination in the names, but I like them. :)

Attachment: ramoth.log.gz
Description: Binary data

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]