[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] Improving mirror fault handling.




On Jan 12, 2009, at 9:26 PM, malahal us ibm com wrote:

4) Transient fault handling
- Since we can't just assume "wait 5 seconds and then see if the failure
still exists", we are going to have to make this configurable.
Discussion should proceed on this in parallel with #2 and #3, since this phase will take a long time for everyone to agree. We have to determine
where the user specifies the configuration - lvm.conf?  CLI?  We also
have to determine /what/ their configuration will be based on - time?
percentage of mirror out-of-sync?

Thank you Jonathan for the nice write up. Transient failure are
generally recoverable after a period of time. The 'time' may vary from
device to device though. lvm.conf based configuration is a good place to
start. Do we really need LV or PV based configuration for this
'timeout'?

The recovery itself doesn't depend on the %of out-of-sync regions, but
that is a good place to start looking for re-allocating the regions if
configured for re-allocation.

Here are my thoughts:
	handle_mirror_transient_failure()
	{
		do {
			if (device-came-back-to-life()) {
				start-resynchronization();
				break;
			}

			if (reallocation-timeout exceeded or
			    re-allocation-too-much out-of-sync) {
				re-allocate();
				break;
			}
			if (some-other-timeout exceeded) {
				log a message and break;
			}
			sleep(for-few-seconds);
			timeout =- few-seconds;
		} while (1)
	}

If we put the configuration in lvm.conf, then it would globally apply to all volume groups and all logical volumes. I might be willing to accept that for a while, but others may want a plan for something better going forward. We don't want to pollute the conf file with new fields that will be useless shortly into the future. If you look in LVM2/doc/example.conf and search for _fault_policy, you can see that there are already some configuration options there. We might stick the new ones there as well. (Although this somewhat confuses me, because they apply only to our default DSO, and you can change the DSO you want to use in a completely different section of the config file... So now you have settings that are worthless because a custom DSO is being used.)

What I meant in regards to "/what/ their configuration will be based on", is that the user may not care about the time they wait for a device to come back, but how far the mirror has gone out of sync while the device has been gone... If one of the legs fails and the mirror is 75% out of sync before the device comes back, the user may just want the device removed and stop waiting. If the user specifies "5 minutes" wait time, but there have been no writes to the mirror in that time, then we could probably wait longer. You see what I mean? A user may wish to use a combination of the two methods... "Wait 20 minutes for the device to come back, but only if the mirror stays > 95% in-sync".

As for the pseudo-code... I wouldn't use a 'while(1)' there... leave the thread free to continue. We could use dmeventd's timer events to trigger the next check for the device coming back (I hope). Your code seems to suggest that you understand my point in the preceding paragraph, but I am a bit confused by the use of '[re-]allocation'. In this piece of code, we are only concerned about whether or not to take action. The action is user defined (see the example.conf mentioned above), so the space may or may not be reallocated.

brassow


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]