[libvirt] [PATCH for-4.2?] block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap

John Snow jsnow at redhat.com
Fri Dec 6 19:02:29 UTC 2019



On 12/6/19 9:36 AM, Eric Blake wrote:
> [adding in Peter Maydell, as there is now potential talk of other
> 4.2-worthy patches]
> 
> On 12/6/19 4:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 05.12.2019 23:16, John Snow wrote:
>>>
>>>
>>> On 12/5/19 3:09 PM, Eric Blake wrote:
>>>> On 12/5/19 1:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Here is double bug:
>>>>>
>>>>> First, return error but not set errp. This may lead to:
>>>>> qmp block-dirty-bitmap-remove may report success when actually failed
>>>>>
>>>>> block-dirty-bitmap-remove used in a transaction will crash, as
>>>>> qmp_transaction will think that it returned success and will cal
>>>>
>>>> call
>>>>
>>>>> block_dirty_bitmap_remove_commit which will crash, as state->bitmap is
>>>>> NULL
>>>>>
>>>>> Second (like in anecdote), this case is not an error at all. As it is
>>>>> documented in the comment above bdrv_co_remove_persistent_dirty_bitmap
>>>>> definition, absence of bitmap is not an error, and similar case
>>>>> handled
>>>>> at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when
>>>>> there is no bitmaps at all..
>>>>
>>>> double .
>>>>
>>>>>
>>>>> But when there are some bitmaps, but not the requested one, it return
>>>>> error with errp unset.
>>>>>
>>>>> Fix that.
>>>>>
>>>>> Fixes: b56a1e31759b750
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov at virtuozzo.com>
>>>>> ---
>>>>>
>>>>> Hi all!
>>>>>
>>>>> Ohm, suddenly we faced this bug. It's a regression in 4.2. I'm very
>>>>> sorry for introducing it, and it sad that it's found so late..
>>>>>
>>>>> Personally, I think that this one worth rc5, as it makes new bitmap
>>>>> interfaces unusable. But the decision is yours.
>>>>>
>>>>> Last minute edit: hmm, actually, transaction action introduced in
>>>>> 4.2, so crash is not a regression, only broken
>>>>> block-dirty-bitmap-remove
>>>>> command is a regression... Maybe it's OK for stable.
>>>>
>>>> Libvirt REALLY wants to use transaction bitmap management (and require
>>>> qemu 4.2) for its incremental backup patches that Peter is almost ready
>>>> to merge in.  I'm trying to ascertain:
>>>>
>>>> When exactly can this bug hit?  Can you give a sequence of QMP commands
>>>> that will trigger it for easy reproduction?  Are there workarounds
>>>> (such
>>>> as checking that a bitmap exists prior to attempting to remove it)?  If
>>>> it does NOT get fixed in rc5, how will libvirt be able to probe whether
>>>> the fix is in place?
>>>>
>>>
>>> It looks like:
>>>
>>> - You need to have at least one bitmap
>>> - You need to use transactional remove
>>> - you need to misspell the bitmap name
>>> - The remove will fail (return -EINVAL) but doesn't set errp
>>> - The caller chokes on incomplete information, state->bitmap is NULL
>>
>>
>> No, that would be too simple, the thing is worse. Absolutele correct
>> removing is broken, without any misspelling
>>
>> Bug triggers when we are removing persistent bitmap that is not stored
>> yet in the image AND at least one (another) bitmap already stored in
>> the image. So, something like:
>>
>> 1. create persistent bitmap A
>> 2. shutdown vm  (bitmap A is synced)
>> 3. start vm
>> 4. create persistent bitmap B
>> 5. remove bitmap B - it fails (and crashes if in transaction)
>>
>> ====
>>
>> Hmm, workaround..
>>
>> I'm afraid that the only thing is not remove persistent bitmaps, which
>> were never synced to the image. So, instead the sequence above, we need
>>
>>
>> 1. create persistent bitmap A
>> 2. shutdown vm
>> 3. start vm
>> 4. create persistent bitmap B
>> 5. remember, that we want to remove bitmap B after vm shutdown
>> ...
>>    some other operations
>> ...
>> 6. vm shutdown
>> 7. start vm in stopped mode, and remove all bitmaps marked for removing
>> 8. stop vm
>>
>> But, I think that in real circumstances, vm shutdown is rare thing...
> 
> This is sounding a bit more serious. As I said earlier, it shouldn't
> delay 4.2 on its own, but if the fix is obvious (and other than
> comments, it is a single change from 'ret = -EINVAL' to 'ret = 0' which
> fixes a definite reproducible crash), I think it rises to the level of
> acceptable.
> 
> I've been so worried about the question of which release, that I don't
> know if I've previously offered:
> Reviewed-by: Eric Blake <eblake at redhat.com>
> 

Oh, that is quite a bit more serious than I thought too.

Yeah, I want this in 4.2 if at all possible.

Reviewed-by: John Snow <jsnow at redhat.com>




More information about the libvir-list mailing list