[Linux-cluster] Resource Structure (proposed, not complete)
Lon Hohberger
lhh at redhat.com
Tue Jun 29 15:41:04 UTC 2004
User Resource Manager Operational Specification, proposed
The User Resource Manager (formerly Service Manager) is the part of Red
Hat Cluster Suite which manages resources and groups which implement a
user's clustered services.
The user resource manager only allows resource groups to operate when
it is running on a quorate member of the cluster. This means that all
resource groups are immediately stopped when a member is no longer
quorate. Typically, the member is also fenced (note: it may not stop
all resource groups prior to being fenced; it certainly tries to).
(Incomplete.)
Failover Domains, proposed
See http://people.redhat.com/lhh/fd.html for information on how
clumanager 1.2 failover domains operate. The configuration format will
have to change slightly, but the operational characteristics need not.
Additionally, we might want to add "Relocate to most-preferred member"
option to prevent unwanted service transitions in ordered failover
domains. (Since failover domains handle multiple cluster members, it
is actually not the same as clumanager 1.0's "Relocate on preferred
node boot" option.)
User Resource Structure, proposed
<resources>
<script name="Oracle Script" <-- Unique name across scripts
file="/path/to/script"/> <-- Path to script
<script name="Apache Script" <-- Unique name across scripts
file="/etc/init.d/httpd"/> <-- Path to script
<mount name="Oracle Data" <-- Unique name across mounts
fstype="gfs" <-- Can share these. Can
mount these multiple times
options="" <-- Defaults ok
device="/dev/sdc1" <-- Could be LV, GNBD, etc.
force_umount="n"
mountpoint="/mnt/oracle_data"/> <-- Mount point
<mount name="Web Data" <-- Unique name across mounts
fstype="nfs" <-- Can't share these!
options="" <-- Mount options
source="server:/webdata" <-- Server/Path specification
mountpoint="/mnt/web_data"/> <-- Mount point
<mount name="NFS Home" <-- Unique name across mounts
fstype="ext3" <-- You can share these
options="ro" <-- Mount options
device="/dev/sdc2" <-- Could be LV, GNBD, etc.
force_fsck="" <-- Force fsck on journalled fs
force_unmount="y"
mountpoint="/mnt/nfs"/> <-- Mount point
<client name="Joe's machine" <-- Name unique across clients
type="nfs" <-- Only NFS support for now
target="joe.boston.redhat.com" <-- wildcards & netgroups too!
options="ro"/>
<client name="Admin's machine" <-- Name unique across clients
type="nfs" <-- Only NFS support for now
target="bob.boston.redhat.com" <-- wildcards & netgroups too!
options="rw"/>
<ip address="172.31.31.2"/> <-- Address is unique
<ip address="172.31.31.4"/> <-- Address is unique
<ip address=":ffff::172.31.31.3"/> <-- Address is unique (Watch
for ip6/ip4 collisions)
ip6 == new feature!
<!-- Web & Oracle Service -->
<group name="Oracle/Web"> <-- Unique name across groups
<script ref="Oracle Script"/> <-- Note, multiple scripts
<script ref="Apache Script"/> (New Feature)
<ip ref="172.31.31.2"/> <-- Not sure if feasible.
<ip ref=":ffff::172.31.31.3"/>
<mount ref="Oracle Data">
<!-- Exports are service specific -->
<export type="nfs" path=""/> <-- If empty string, refers
to parent's mountpoint
<client ref="Joe's machine"/> <-- Joe can mount this.
</export>
<export type="samba"/> <-- no change from 1.2 for
now
</mount>
</group>
<!-- Home directory service */
<group name="Homedirs"/>
<ip ref="172.31.31.4"/>
<mount ref="NFS Home">
<export type="nfs" path=""/>
<client ref="Joe's machine"/>
<client ref="Admin's machine"/>
</export>
</mount>
</group>
</resources>
===============================================
Rules concerning individual resource behavior
===============================================
<mount> resource:
- When fstype is 'gfs' or 'nfs', the mount may be defined as parts of
multiple resource groups. If this is the case, the force_umount
option is ignored.
- When fstype is not 'gfs' nor 'nfs', the mount may only be defined
as part of one resource group.
- When fstype is 'nfs' or 'gfs', force_fsck is ignored.
- When fstype is 'nfs', force_umount is ignored.
- When fstype is 'ext3', 'jfs', 'reiserfs', or 'xfs', the file system
is only fsck'd if the force_fsck option is turned on.
- When fstype is 'ext2', force_fsck is ignored and the file system
is always checked on failover or relocation.
<ip> resource:
- An IP resource may only be part of one resource group. If it is
defined in multiple resource groups, the first resource group to
start will have the IP address, and the second resource group will
fail to start.
- IPv6 address corresponding to an IPv4 address specified in another
<ip> resource is not allowed.
<script> resource:
- These may be a member of multiple resource groups, but beware the
cost of doing so - the cluster makes no assumptions with respect
to data being available to scripts. Because of this, scripts
depend on all other pieces of a resource group to be running.
<client> resource:
- These may be a member of any number of <exports> in any number of
resource groups.
======================
Dependency Structure
======================
group__________
/ \ \
/ \ \
ip mount ...group...
/ \/ \
/ /\ \
/ / \ \
script export
\
\
client
Wherever a leaf node exists, you may restart that leaf node without
affecting other nodes in the dependency tree. For instance, if you
change the export-client options for "Joe's Machine", then the
export is removed and replaced with the new options.
In our example above, the "Oracle/Web" service and the "Homedirs"
service both have exports with clients pointing to "Joe's Machine".
Changing the option on one changes the option on both exports.
You may detach leaves without affecting the rest of the resource
group. That is, you may detach "Joe's Machine" without stopping
the service. You may also attach leaves without affecting the rest
of the resource group; so you may add "Bill's Machine" to the export
without restarting the resource group. Similarly, you may add or
restart a script.
Whenever a node of the tree does not have anything depending on it,
it is by definition a leaf node. Thus, you may add, start, or stop
IP addresses providing no scripts or exports are defined for a given
resource group.
Rules:
<group> resource:
- A group may depend on any number of other groups. When a
group depends on another group, the child group is started prior
to any other resources of the parent group starting. Additionally,
they are managed in a single start phase and are thus started on
the same cluster member; so it is only a logical grouping;
resource groups as dependent children must start on the same
cluster member as their parents.
- A group, if depended upon by another group, may not depend on
another group to start. That is, A may depend on B, but if so,
then B may not depend on anything. This both prevents circular
dependencies and arbitrarily complex services.
- A resource group fails to start if any one of its dependent
children fails to start.
<ip> resource:
- An IP resource may not be added, removed, or changed if an export
or user script is present in the resource group. If neither an
export nor a user script is present in the group, it may be
restarted without affecting other parts of the group.
- An IP resource fails to start if any one of its dependent children
fails to start
<script> resource:
- A script resource may be added, modified, or changed without
affecting other members in the resource group.
- A script resource is not started unless all mount and ip resources
have started.
<mount> resource:
- A mount resource may not be added, removed, or changed if an
export or user script is present in the resource group. If neither
an export nor a user script is present in the group, it may be
restarted without affecting other parts of the group.
- A mount resource fails to start if any one of its dependent children
fails to start.
<export> resource:
- Export resources are not defined outside of a resource group; they
are properties of a given mount resource and defined only in the
context of a resource group.
- An export resource may not be added, removed, or changed if a
client exists and is depending upon it.
- An export resource does not fail to start unless all of its
dependent children fail to start.
- Export resources with type "samba" are not started unless all
mount and ip resources have been started.
- Export resources with type "nfs" are not started unless all mount
resources have been started.
<client> resource:
- Client resources may be added, removed, or changed at any time
without affecting the operation of any other part of the resource
group.
==================================
How it works - a High Level View
==================================
Note - this is the same way clumanager 1.0 and 1.2 do it; the main
differences are in the fact that we have the ability to start
individual export clients. The intention is not to illustrate this
here; I have no idea how that's going to work yet ;) BTW, nfs exports
are intentionally started apart from Samba exports, as samba exports
generally bind to IP addresses in the resource group (ugly), but NFS
exports need no such thing and in fact, exporting after the IP
address comes up causes problems with failover and/or service
relocation.
group_start () {
for (each group) {
if (group_start(group) != SUCCESS)
return FAIL;
}
for (each mount) {
if (start_mount() != SUCCESS)
return FAIL;
for (each export) {
if (type != nfs)
continue;
for (each client)
/* Log errors */
start_client(export_directory);
}
}
for (each ip) {
if (start_ip() != SUCCESS)
return FAIL;
}
for (each mount) {
for (each export) {
if (type != samba)
continue;
if (start_samba(export) != SUCCESS)
return FAIL;
}
}
for (each script) {
if (start_script() != SUCCESS)
return FAIL;
}
return SUCCESS;
}
group_stop () {
for (each script) {
if (stop_script() != SUCCESS)
return FAIL;
}
for (each ip) {
if (start_ip() != SUCCESS)
return FAIL;
}
for (each mount) {
if (start_mount() != SUCCESS)
return FAIL;
for (each export) {
if (type == nfs) {
for (each client)
/* Log errors */
start_client(export_directory);
} else
stop_samba(export);
}
}
for (each group) {
if (group_stop(group) != SUCCESS)
return FAIL;
}
return SUCCESS;
}
More information about the Linux-cluster
mailing list