[dm-devel] A bug in dm-persistent-data module which leads to dm-thin metadata corruption

Teng-Feng Yang shinrairis at gmail.com
Fri Mar 7 04:00:07 UTC 2014


Dear all,

I had experienced a dm-thin metadata corruption a couple of days ago,
and I found that someone had
reported the similar corruption to dm-devel recently.
http://www.redhat.com/archives/dm-devel/2014-February/msg00157.html

Since this issue will leads to unrecoverable metadata corruption and
could be reproduced every time,
we add some traces and hope to find out the root cause of this. After
dumping the trace, I think we
might find a bug in dm-persistent-data and I will try my best to
explain it clearly in below.

When decreasing the reference count of a metadata block with its
reference count equals 3,
we will call dm_btree_remove() to remove this enrty from the B+tree
which keeps the reference count info
in metadata device.

The B+tree will try to rebalance the entry of the child nodes in each
node it traversed, and
the rebalance process contains the following steps.

(1) Finding the corresponding children in current node (shadow_current(s))
(2) Shadow the children block (issue BOP_INC)
(3) redistribute keys among children, and free children if necessary
(issue BOP_DEC)

Since the update of a metadata block's reference count could be
recursive, we will stash these
reference count update operations in smm->uncommitted and then process
them in a FILO fashion.
The problem is that step(3) could free the children which is created
in step(2), so the BOP_DEC issued
in step(3) will be carried out  before the BOP_INC issued in step(2)
since these BOPs will be processed in
FILO fashion. Once the BOP_DEC from step(3) tries to decrease the
reference count of newly shadow block,
it will report failure for its reference equals 0 before decreasing.
It looks like we can solve this issue by processing
these BOPs in a FIFO fashion instead of FILO.

Any comment will be grateful.

Thanks.
Dennis




More information about the dm-devel mailing list