[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] how to improve basic skill in linux



i want to improve my basic skill in linux such i want to learn how to configure openoffice , any media player in RedHat 5 Version

please help on these topics.



Harvinder Singh S/O Baldev Raj, VPO Barwa Teh. Anandpur Sahib, Dist. Ropar, PunjabE-Mail ID:-     jmd_singhsaini yahoo com


--- On Fri, 17/2/12, Jan Huijsmans <Jan Huijsmans interaccess nl> wrote:

> From: Jan Huijsmans <Jan Huijsmans interaccess nl>
> Subject: Re: [Linux-cluster] Cluster stability with missing qdisk
> To: "linux clustering" <linux-cluster redhat com>
> Date: Friday, 17 February, 2012, 2:24 AM
> Hi,
> 
> > Please stay on-list or call Red Hat Support.
> 
> Whoops, my bad, it's back on-list again. (reply without
> checking to didn't help)
> 
> > On 02/16/2012 04:50 AM, Jan Huijsmans wrote:
> >>>> In the clusters we have we use a qdisk to
> determine which node had the quorum, in case of a split
> brain situation.
> >>
> >>>> This is working great... until the qdisk
> itself is hit due to problems with the SAN. Is there a way
> to have a stable cluster,
> >>>> with qdisks, where the absence of (1) qdisk
> won't kill the cluster all together. At this moment, with
> the setup with 1 qdisk,
> >>>> the cluster is totally depending on the
> availability of the qdisk, while, IMHO, it should be
> expendable.
> >>
> >>> What kind of problems are you trying to avoid?
> >>
> >>> 1) I/O errors ->  disk died:
> >>
> >>> solution: set max_error_cycles to something
> nonzero (1? 2?), and qdiskd
> >>> will then exit on the host where the problems
> are occurring when I/O
> >>> errors are received
> >>
> >> We now have the interval for the qdisk set to 3 and
> tko to 50. So the status is
> >> updated every 3 seconds and it's allowed to fail 50
> times.
> >>
> >> Will the max_error_cycles cause the qdisk tries to
> fail when it didn't respond on
> >> time? If so, what is it's relation with the
> interval and tko?
> >>
> >> Is this an option that can be used with the
> clustering suite in RHEL 5.6 software stack?
> >>
> >>> 2) Long I/O hangs (e.g. path fail-over)
> >>
> >>> solution: current 3.1.x / 3.2.x differentiates
> between I/O hangs and I/O
> >>> errors, so hangs (e.g. due to path fail-over)
> no longer cause reboots.
> >>
> >> We have seen I/O hang of over 350 seconds at the
> worst times. (it's now<  10 seconds)
> >> We see discarded frames on the SAN, so it's
> explainable. Only the system has
> >> 4 paths, 2 on 1 fabric and 2 on the other. The
> default failure detection time is
> >> 60 seconds in the RedHat default set-up. (which
> wasn't changed)
> >
> > You can hang forever with the new upstream feature as
> long as the nodes
> > can communicate.
> 
> This is usefull. Is this available in the rhn channels for
> the 5.6 RHEL release or
> is there an upgrade needed.
> 
> >> Our setup has 3 locations, datacenters A and B and
> quorum location C.
> >> The last location is used by the SAN (IBM SVC/V7000
> units) to determine
> >> which datacenter (A or B) has access to C, when
> there is no communication
> >> possible between both datacenters.
> >
> > For starters, set master_wins to '1' and don't use
> heuristics.
> 
> I'll see when I can test this. There was 1 cluster I had to
> add heuristics to ensure
> logging from the evicted node before it was reset. (It's
> very irritating when a node
> is evicted without a logged cause)
> 
> >> I would like to migrate the qdisk to this location,
> so we have the same setup
> >> as with the SAN. The main problem is the failure of
> the quorum location C.
> 
> > Sure.
> 
> >> When we move the qdisk there and it fails, the
> cluster will fail on the qdisk,
> >> when it should be able to function properly, as
> both nodes are up and are
> >> able to communicate with each other.
> >
> > Setting max_error_cycles to 1 will cause I/O errors to
> remove the quorum
> > disk on the host.
> 
> > The new upstream feature will prevent a hang from
> causing evictions.
> 
> > There is no method to 'ignore' eviction notices.
> 
> I don't want to ignore it, I don't want to get them when the
> nodes can reach each
> other and both could do the job they need to.
> 
> >> On the SAN setup this is solved with 3 'qdisks',
> with one on each location. (A, B
> >> and C) When C fails, there are still 2 qdisks
> available, so the cluster keeps
> >> functioning.
> 
> > Qdiskd doesn't work deterministically in replicated
> environments.
> 
> I was thinking, is it possible to use an MD device with 3
> mirror copies as qdisk
> device? This would give the same functionality with only 1
> qdisk device.
> 
> >> The problem that I'm trying to solve is the
> complete failure of the qdisk taking
> >> down a perfectly correct operating cluster. We have
> to guard against a split brain
> >> situation, but at the moment the costs of the
> qdisks are to high. (all clusters
> >> are now limited to 1 node to prevent failures due
> to the qdisk problems)
> 
> > You might not need a quorum disk at all.
> 
> > A quorum disk doesn't obviate the need for fencing to
> complete in
> > environments where you have a streched cluster. 
> E.g. even when you have
> > sites A and C, when B dies, it will need to be
> fenced.  This will fail,
> > because the site is not available.
> 
> That was what's bothering me on the design of the current
> cluster set-up.
> However, when both nodes could reach the qdisk and not each
> other
> via LAN, they evicted each other. (which was executed as
> soon as
> the LAN was back up...)
> 
> > Why don't you take a look at these and file a ticket
> with Red Hat Support:
> 
> >   https://access.redhat.com/kb/docs/DOC-53348
> >   https://access.redhat.com/kb/docs/DOC-58412
> 
> I'll take a look at it.
> 
> -- Jan
> 
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]