[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [linux-lvm] pvmove hangs



-----Original Message-----
From: linux-lvm-bounces redhat com [mailto:linux-lvm-bounces redhat com]
On Behalf Of Thomas Hager
Sent: Tuesday, August 17, 2010 6:12 PM
To: linux-lvm redhat com
Subject: Re: [linux-lvm] pvmove hangs

On Tue, 2010-08-17 at 15:26 -0400, Allen, Jack wrote:
> I know pvmove is part ofLVM2, but because it worked with PowerPath and
> not when using Multipath and all other things are the same is the
> reason I am asking the questions here.
we had similar problems with novell SLES every now and then, and they
were not reproduceable and occured in random time frames. 

among them:

- the pvmove simply stalled, outputting the same %-done message every 15
seconds until we reset the server.

- the server crashed and performed an automated reboot.

and worst:

- pvmove immediately threw an I/O error after starting, committed all
pending moves though -> all data previously residing on the old LUN was
lost :(

novell provided several updates to the kernel, lvm2 and the
device-mapper, but there might still lurk some bugs we haven't triggered
yet. one advise they gave us was to only migrate PEs of one LV at a
time, which we followed afterwards.

we've seen this behaviour only on SLES, which is the distribution we use
on most of our servers. the few redhats we have migrated fine with
pvmove, we didn't migrate much on these though (only some hundred GB
compared to the ~50TB we had to migrate with SLES). and it was not
related to the storage driver we used, we faced the same issues with
HP's adapted qlogic driver as well as with dm-mp.

anyway, you definitely should open an SR in redhat's CC, so they can
investigate the issue more closely.

hth,
tom.
=============
Thanks for the info Tom.

After sending the post I tried the pvmove command several more times,
this time adding the -v option. One PV had 3 LVs on it and it completed
the first LV with no problem and then as it completed the second LV is
displays suspending LV (the first one did to) and this is when
everything related to the PV hung. If you try to do any LVM commands on
the VG they hang, but if you abort them with a ^C it states aborted
while waiting on flock /var/lock/lvm/X, where X is the VG name. If I
remove the file I can then do LVM commands related to the VG, but the
pvmove is still hung. So it would seem there is some race condition,
deadly embrace, catch 22 of 2 resources waiting on each other. Writes
are waiting because the LV is suspended, and suspending is waiting
because there are outstanding writes to be done.

I plan to open case with Red Hat, but was hoping the problem had already
been handled. I am still concerned about it not working with multipath,
but works fine with Powerpath, because I would rather use multipath
because it makes doing yum updates a lot easier.

------
Jack Allen


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]