[linux-lvm] [PATCH 10/10] man: document --node option to lvchange

Wed Mar 20 12:12:59 UTC 2013

20.03.2013 11:45, Zdenek Kabelac wrote:
> Dne 19.3.2013 18:36, Vladislav Bogdanov napsal(a):
>> 19.03.2013 20:16, David Teigland wrote:
>>> On Tue, Mar 19, 2013 at 07:52:14PM +0300, Vladislav Bogdanov wrote:
>>>> And, do you have any estimations, how long may it take to have you
>>>> ideas
>>>> ready for production use?
>>>
>>> It'll be quite a while (and the new locking scheme I'm working on
>>> will not
>>> include remote command execution.)
>>>
>>>> Also, as you're not satisfied with this implementation, what
>>>> alternative
>>>> way do you see? (calling ssh from libvirt or LVM API is not a good idea
>>>> at all I think)
>>>
>>> Apart from using ovirt/rhev, I'd try one of the following behind the
>>> libvirt locking api: sanlock, dlm, file locks on nfs, file locks on
>>> gfs2.
>>
>> Unfortunately none of these solve the main thing I need: Allow LVM
>> snapshots without breaking live VM migration :(
>>
>> Cluster-wide snapshots (with shared lock) would solve this, but I do not
>> expect to see this implemented soon.
>>
> 
> Before I'll go any deeper with reviewing patches myself - I'd like to
> make myself clean about this 'snapshot' issue.
> 
> (BTW there is already one thing which will surely not pass - it's the
> 'node' option for lvm command - this would have to be made diferently).
> 
> But back to snapshots -
> 
> What would be the point of having (old, non thinp) snapshots active at
> the same time on more then 1 node ?

There is no need on this.

I need source volume itself to be active on two nodes to perform live VM
migration. libvirt/qemu controls which instance has CPUs turned on. But
qemu processes on two nodes need to have LV open simultaneously.

I'm able to take snapshot only when volume is activated exclusively.
I can open that snapshot (and take backup) only on node where source
volume is exclusive.

And I ultimately do not want to take VM down to lock LV exclusively to
take snapshot (if it runs on a shared-locked VM) and I do not want to do
offline migration (with exclusive lock on LV). To satisfy this lock
conversion is needed.

I'm still new to thinp, because it was introduced relatively recently,
and I had no chance to look at it closer (I tried to allocate pool once
on a clustered VG and the whole test cluster stuck because of this).

Does it work on clustered VGs now?
And, is it possible now to take/activate/open thinp snapshot on a node
different from one where source volume is open?

> 
> That would simply not work - since you would have to ensure that noone
> will write to  snapshot & origin on either of those nodes?
> 
> Is your code doing some transition which needs active device on both nodes
> treating them in read-only way ?

Yes, but not my, it is libvirt.
It opens block device (LV) on both source and destination nodes (it runs
qemu process in a paused state on a destination node and that process
opens block device).
After that memory state is transferred to a destination node, then qemu
process on a source node is paused (turns off virtual CPUs), then qemu
process on a destination node is resumed (turns on virtual CPUs) and
then qemu process on a source node is killed, thus releasing the LV.
Adding one more migration phase ("confirm confirmation") and thus
introducing one more migration protocol version seems to be overkill for me.

When qemu process is paused on a node, LV is effectively read-only (ok,
almost read-only, libvirt still try to set up DAC permissions and
selinux label on it, but data is not written).

There is only bit of time when both source and destination processes are
paused (less that millisecond).

When qemu is running, it writes to device.

What concerns my code in libvirt:
I made one more "logical" pool subtype - clvm, which starts with all LVs
deactivated.
I also wrote the locking driver (which works similar to sanlock an
virtlockd ones), which
* activates volume exclusively on start
* converts lock to shared on a source node before migration
* activates volume in a shared mode on a migration target
* deactivates volume on a source node after migration is finished
* converts lock from a shared to exclusive remotely on destination node
from a source node

It also has local locking concept to prevent LV to be opened more than
one time on the node it is activated exclusively.

As I wrote above, there is no event like "you can convert lock to
exclusive" available on a destination node.

> 
> Since metadata for snapshot are only parsed during first activation of
> snapshot, there is no way, the second node could resync if you would
> have written to the snapshot/origin on the first node.
> 
> So could you please describe in more details how it's supposed to work?

It is ok for me to lose snapshot during migration. I just need to be
able to backup VM data while it is constantly running on one node. If
pacemaker decides to migrate VM, then backup just fails and will be
restarted (together with new snapshot creation) from a beginning after
migration is finished.

Vladislav