[Linux-cluster] Probably some silly mistake setting up a cluster ?

Petr Tuma petr.tuma at nenya.ms.mff.cuni.cz
Wed Feb 27 15:12:43 UTC 2008


Greetings,

I am trying to set up a cluster with (for now) two nodes, reason being
the semantic guarantees of GFS when accessing shared files (that is, I
am not interested in fault tolerance, performance or anything else).
Unfortunately, I keep running into all sorts of problems, for
example:

    - After a few hours of intensive workload, the cluster sometimes
simply stops. All file system calls block, but things like cman_tool
status or group_tool status insist everything is all right. Soft reboot
is not possible due to various services waiting infinitely, after power
cycling fsck finds inconsistencies on the file system.

    - Sometimes, when trying to execute a binary on the file system, I get
execvp returning permission denied when it should not, but when I try
again, everything is all right. I sometimes even observe this when
trying to start a script on the file system, as if the interpreter of
the script (which is on a different file system altogether) had wrong
permissions. Again, simply trying one more time makes everything work.

The config of the cluster seems relatively simple:

    - i686 single CPU node
       - file system device accessible over iSCSI
       - cluster subnet (unfortunately) connected over OpenVPN
    - x86_64 eight CPU virtual node
       - file system device provided by host which uses iSCSI
    - both nodes resolve into the same subnet using /etc/hosts
    - nothing except a single GFS2 file system is mounted
    - fencing uses fence_manual
    - both nodes run Fedora 8

Config attached, not like there is anything unusual in it.

As an absolute novice, I am probably making some glaringly obvious silly
mistake which results in the very weird behavior described above, but
try as I might, I do not see anything that can cause this ?

Thanks for any advice, Petr


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 711 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080227/9ccfa91e/attachment.xml>


More information about the Linux-cluster mailing list