We are talking about application servers.
One of the toughest things about clustering in general and GFS in particular is the failure scenarios.
When you have any sort of cluster issue, if your root is on a shared GFS, that GFS freezes in various ways until fencing happens. The problem with this is that certain binaries that are on the same GFS may need to be used to recover. How do you execute fence_apc to fence a failed node when it is on a GFS that is hung waiting on that same fencing operation?
There are ways around this involving RAM disks and the like, but eventually we just settled on having a minimal flash disk that would get us onto our SAN (but not clustered). Only after we were on a non-clustered-FS on our SAN would we then start up our clustered filesystem. This gave us the ability to move our nodes around easily. This is an often overlooked benefit of a shared root that putting your root FS on SAN gives you as well. There's nothing like booting up a dead node on spare hardware. This also gives you a solid way to debug a damaged root system. With shared-root it's all or nothing. It's not so with this configuration. You also have separate syslog files and other things that are one more special case on a shared root. It's also easy to set up nodes with slightly different configurations (shared-root makes this another special case). As for the danger of drive failure, a read-only IDE flash disk (Google for Transcend) is simple, easy, and dead solid.
After consolidating your shared configuration files into /etc/shared and placing appropriate symlinks into that directory, it is a simple matter of rsync / csync / tsync / cron+scp to keep them synchronized.
It is tempting to want to have a shared root to minimize management requirements. It is tempting to want to play games with ramfs and the like to provide a support system that will function when that shared root is hung due to clustering issues. It is tempting to think that having a shared GFS root is really useful.
However, if you value reliability and practicality, it's much easier to script up an occasional Rsync than it is to do so many acrobatics for such little gain. For a cluster (and its apps) to be reliable at all, it needs to be able to function, recover, and generally have a stable operating environment. Putting GFS under the userspace that drives it is asking for trouble.
On Jan 31, 2007, at 1:34 PM, isplist logicore net wrote: