a couple questions from a cluster newbie

Mon Mar 23 06:47:37 UTC 2009

Hi there,

Apologies is anyone has answered this already and I have missed it. This post has been out for a while now.

I would configure three VM's on the Failover box and add the ability to have each server failover separately. This would involve having three load balanced clusters as in the attached, again fixed sized fonts.

To replicate data between the virtual server and the physical server within each cluster I would use DRBD (RAID1 on a network level), you can configure this so that only once the data is committed to disk on both sides does the kernel confirm the write. This will present the system with a new block device and data must only be read and written via this device. As long as your system is 'strong' enough and the link between the servers is fast enough (this would depend on the amount of changed to the data -  how much data would need to be written to the block device on the other end of the network) it will be just like reading and writing to any other block device.

For the backend you could use Conga with luci and ricci to manage the cluster (thinking about ways to avoid pain going forward) but I have not done this in a production environment so I'm not sure about the details.

I'm afriad I have worked very little GFS as well so I can't answer you on that side of things. Maybe the GNBD would be better for the load balanced server replication as well, but as far as I know the main reason you would use GNBD is that it exports the file system to many users and manages locking better between the users which wouldn't help in the pg/ds/ap clusters. Can anyone confirm?

Just so I'm clear on the backend side. It sounds like there is a level of interaction between users and the actual data on the backend servers. Do the users query a process on the storage/processing servers and then that process works on the data and gives the user a result? Or do the users interact with the data directly?

Regards,
Colin van Niekerk
RHCE: 805008755334920

________________________________________
From: redhat-sysadmin-list-bounces at redhat.com [redhat-sysadmin-list-bounces at redhat.com] On Behalf Of Laurent Wandrebeck [lw at hygeos.com]
Sent: 20 March 2009 06:12 PM
To: redhat-sysadmin-list at redhat.com
Subject: a couple questions from a cluster newbie

Hi list,

our park is going to gain three new boxes, pushing storage size to 70TB.
I think it's time to get rid of nfs /net automounts, and to go for some kind
of a cluster.
long story short:
each typical server has a local storage (1 to 8TB, up to 15 soon), that are
sata discs connected to a 3ware card, using hard raid 10 or 5.
each of these machines is aimed at processing data from a given satellite.
there are also one pgsql server, one apache server, one nis/home (via nfs)
server each with a 3ware and its discs. brw, the nis/nfs server is soon to be
turned into a directory server.
gbps network, non administrable switches. /24 network class.

now, I'd like to transform that mess into:
1) have one GFS volume for sat1...N data. So that, if needed, you can process
whatever you want from whatever machine.
2) have a failover machine that could automagically take load for pg, apache
and nfs/nis (the soon to be directory server) if the dedicated box fails.
that means an efficient replication so data are identical on original
pg/apache/etc machines and the failover one.
3) have some kind of load balancing on sat1...N, that would put processes on a
box where processed data are local, without having the user to decide where
to launch processes. resulting data from processes would have to be written
on the local storage of the box. So that sat1 data and sat1 processed data
stay on the same physical volume. That way, if a box really badly crashes, we
know which data were lost (we can't afford to backup 70TB).

now, questions (thx for arriving down there:) :

1) what i've read in doc is i should use gndb. am i on the right track ? It's
unclear to me if it is safe to use a machine both for serving and processing
data.
2) failover should be possible if i understood correctly doc. where i'm a bit
stuck is the replication part part. wal shipping should do the trick for pg.
directory server has some kind of failover mechanism afaik. about apache, i'm
a bit in the dark. could someone enlighten me ?
3) is such a thing possible with cluster suite ? at all ? Would there be any
better way to solve the problem of the boxes configuration so our DC can
continue to grow without becoming a nightmare for me and users ?
4) right now, user homes follow them to whatever box they log on. should /home
be another gfs volume so that every server (potentially hidden by load
balancing if i understood correctly) can continue to access these data
(processing codes are often on /home). Any other solution ?

You'll find attached some kind of ascii art trying to describe what i'd like
to get :) (open it with fixed size font)
Thanks a lot for helping.
Best Regards,
--
Laurent
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fo.txt
URL: <http://listman.redhat.com/archives/redhat-sysadmin-list/attachments/20090323/6d862b29/attachment.txt>