[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] [PATCH] 9/9: Reactivate paths that have come back to life.


I've recently realized that this patch introduced a potential deadlock.

In multipath_suspend(), we iterate through the groups and paths while holding 
the path_lock spinlock. For each path, we down that path's test_lock 
semaphore in order to stop all path-testing while the device is suspended.

However, if a path is currently being tested when multipath_suspend() is 
called, then multipath_suspend() has to wait on the test_lock semaphore, 
while holding the path_lock spinlock. When the test bio completes, 
test_endio() needs to take the path_lock spinlock in case it has to move a 
path from the valid list to the invalid list (or vice-versa). Deadlock.

So I thought for a long time about how to prevent this, and couldn't come to 
an easy answer. We need the test_lock to prevent the same test bio from being 
submitted multiple times and to be able to stop testing altogether during a 
suspend. But we also need the path_lock to protect places where we have to 
move paths between the lists.

So this got me thinking a bit more...why do we actually need to separate the 
valid and invalid paths into different lists? It would be just as simple to 
have an atomic_t in the path struct to indicate if a path was valid or 
invalid. Combining the valid and invalid lists would completely eliminate the 
need for the path_lock, since the group list and path list would never have 
to change once the mapping was constructed.

The only advantage I currently see for the separate lists is that 
__find_priority_group() can easily determine if a group has any valid paths 
to select. But this could also be accomplished with another atomic_t counter 
in the priority_group struct to keep track of the number of valid paths.

So, unless somebody is terribly adverse to this idea, I'm going to start 
working on a patch to combine the valid and invalid lists in the 
priority_group into a single list, which will hopefully simplify the locking 
quite a bit.


On Tuesday 13 January 2004 16:01, Kevin Corry wrote:
> If a test bio completes successfully and the path is currently invalid,
> reactivate the path.
> --- diff/drivers/md/dm-mpath.c	2004-01-13 15:21:51.000000000 -0600
> +++ source/drivers/md/dm-mpath.c	2004-01-13 15:44:45.000000000 -0600
> @@ -199,15 +199,32 @@
>  	path->pg->ps->type->set_path_state(path->pg->ps, path, 0);
>  }
> +static void __recover_path(struct path *path)
> +{
> +	if (!path->has_failed)
> +		return;
> +
> +	atomic_set(&path->fail_count, MPATH_FAIL_COUNT);
> +	list_move(&path->list, &path->pg->valid_paths);
> +	path->pg->ps->type->set_path_state(path->pg->ps, path, 1);
> +	path->has_failed = 0;
> +}
> +
>  static int test_endio(struct bio *bio, unsigned int done, int error)
>  {
>  	struct path *path = (struct path *) bio->bi_private;
> +	struct multipath *m = path->pg->m;
> +	unsigned long flags;
>  	if (bio->bi_size)
>  		return 1;
> +	spin_lock_irqsave(&m->path_lock, flags);
>  	if (error)
>  		__fail_path(path);
> +	else
> +		__recover_path(path);
> +	spin_unlock_irqrestore(&m->path_lock, flags);
>  	up(&path->test_lock);
>  	return 0;

Kevin Corry
kevcorry us ibm com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]