[Linux-cluster] Questions about GFS

Greg Perry gregp at liveammo.com
Wed Apr 12 15:28:13 UTC 2006


Also, after reviewing the GFS architecture it seems there would be 
significant security issues to consider, ie if one client/member of the 
GFS volume were compromised, that would lead to a full compromise of the 
filesystem across all nodes (and the ability to create special devices 
and modify the filesystem on any other GFS node member).  Are there any 
plans to include any form of discretionary or mandatory access controls 
for GFS in the upcoming v2 release?

Greg

Greg Perry wrote:
> Thanks Bowie, I understand more now.  So within this architecture, it 
> would make more sense to utilize a RAID-5/10 SAN, then add diskless 
> workstations as needed for performance...?
> 
> For said diskless workstations, does it make sense to run Stateless 
> Linux to keep the images the same across all of the workstations/client 
> machines?
> 
> Regards
> 
> Greg
> 
> Bowie Bailey wrote:
>> Greg Perry wrote:
>>> I have been researching GFS for a few days, and I have some questions
>>> that hopefully some seasoned users of GFS may be able to answer.
>>>
>>> I am working on the design of a linux cluster that needs to be
>>> scalable, it will be primarily an RDBMS-driven data warehouse used
>>> for data mining and content indexing.  In an ideal world, we would be
>>> able to start with a small (say 4 node) cluster, then add machines
>>> (and storage) as the various RDBMS' grow in size (as well as the use
>>> virtual IPs for load balancing across multiple lighttpd instances. 
>>> All machines on the node need to be able to talk to the same volume
>>> of information, and GFS (in theory at least) would be used to
>>> aggregate the drives from each machine into that huge shared logical
>>> volume).
>>> With that being said, here are some questions:
>>>
>>> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
>>> there any locking issues to consider?  What would the best open source
>>> RDBMS be (MySQL vs. Postgresql etc)
>>
>> Someone more qualified than me will have to answer that question.
>>
>>> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
>>> can you use GFS to aggregate all 10 drives into one big logical 3000GB
>>> volume?  Would that scenario work similar to a RAID array?  If one or
>>> two nodes fail, but the GFS quorum is maintained, can those nodes be
>>> replaced and repopulated just like a RAID-5 array?  If this scenario
>>> is possible, how difficult is it to "grow" the shared logical volume
>>> by adding additional nodes (say I had two more machines each with a
>>> 300GB SATA drive)?
>>
>> GFS doesn't work that way.  GFS is just a fancy filesystem.  It takes
>> an already shared volume and allows all of the nodes to access it at
>> the same time.
>>
>>> 3) How stable is GFS currently, and is it used in many production
>>> environments?
>>
>> It seems to be stable for me, but we are still in testing mode at the
>> moment.
>>
>>> 4) How stable is the FC5 version, and does it include all of the
>>> configuration utilities in the RH Enterprise Cluster version?  (the
>>> idea would be to prove the point on FC5, then migrate to RH
>>> Enterprise).
>>
>> Haven't used that one.
>>
>>> 5) Would CentOS be preferred over FC5 for the initial
>>> proof of concept and early adoption?
>>
>> If your eventual platform is RHEL, then CentOS would make more sense
>> for a testing platform since it is almost identical to RHEL.  Fedora
>> can be less stable and may introduce some issues that you wouldn't have
>> with RHEL.  On the other hand, RHEL may have some problems that don't
>> appear on Fedora because of updated packages.
>>
>> If you want bleeding edge, use Fedora.
>> If you want stability, use CentOS or RHEL.
>>
>>> 6) Are there any restrictions or performance advantages of using all
>>> drives with the same geometry, or can you mix and match different size
>>> drives and just add to the aggregate volume size?
>>
>> As I said earlier, GFS does not do the aggregation.
>>
>> What you get with GFS is the ability to share an already networked
>> storage volume.  You can use iSCSI, AoE, GNBD, or others to connect
>> the storage to all of the cluster nodes.  Then you format the volume
>> with GFS so that it can be used with all of the nodes.
>>
>> I believe there is a project for the aggregate filesystem that you are
>> looking for, but as far as I know, it is still beta.
>>
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list