[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [lvm-devel] Internal error: Pool read_vg crc mismatch only when running in test environment

Dne 27.6.2013 00:49, Tony Asleson napsal(a):
On 06/26/2013 02:49 AM, Zdenek Kabelac wrote:
Dne 26.6.2013 01:28, Tony Asleson napsal(a):
On 06/21/2013 08:53 AM, Zdenek Kabelac wrote:
Dne 21.6.2013 01:18, Tony Asleson napsal(a):
Writing some new unit test cases for my latest liblvm patch set and at
the moment I am running into a case where I can run the unit test case
against real disk and it works, but if I run it in the test environment
with loop back devices I am getting an abort with:

"Internal error: Pool read_vg crc mismatch."

Any ideas why this error isn't occurring on both?

This happens - if you have requested 'read-only'  VG struct,
and you have modifed something in this vgmem pool
(either writing to struct, or just using vgmem pool for allocation)

My original inquiry was for ideas why I see this on loop back devices
and not actual devices.  This response doesn't seem to match what I am

It's hard to give advice if I do not see the actual code.

Patch set was posted a while back.  Specifically the issue I am running
into has to do with the functionality to list all PVs (including orphan)

Specific patch is:

The stack trace shows that we are getting this error during an
lvm_vg_open.  I can re-create the error regardless if I open the vg
struct as read-only or read-write.

  From my initial debug it appears that if the vginfo->vg_use_count > 1
(in this case 3) we pass 1 as the second parameter to dm_pool_unlock
which is a crc check of the pool which it finds to be different.  At
this point it would seem I am exacerbating some type of caching bug or
that somewhere along the path I am inadvertently changing the contents
of the vg struct pointer with my latest patches.

There is internal debug support for this kind of problems which mprotects
vg structure, so any write access to locked vg structure crashes the
application and you may look at stack trace.

But it needs some hand modification of make.tmpl file (no configure option)
Uncomment #DEFS += -DDEBUG_ENFORCE_POOL_LOCKING  and rebuild and retest
unlimited coredump size.

Thanks for the tip, this helped quite a bit!  I modified the code in the
above referenced patch to retain the vg pointer when retrieving the list
of PVs.  The liblvm library uses the vg->vgmem pool to allocate strings
to be returned to the user when getting information pertaining to a PV.
  The problem is that the vg gets cached with a crc.  We then allocate
memory from the vgmem pool and later when we go to clear out the cache
entry we fail the crc check.  When I enable the
DEBUG_ENFORCE_POOL_LOCKING we fail after we retrieve a vg from the cache
and then I try to allocate something from the vgmem pool.

Your comment about opening the vg "r" vs. "w" still seems incorrect to
me, but if it is indeed correct then we have a problem with the library
that precedes any of my changes.  We use the vg->vgmem for many things
and if this argument is indeed true then all those allocations will be
causing the vg struct to change, thus causing crc errors if the user is
allowed to open the vg struct as read-only.  In my code review and
testing it appears that we aren't hitting this because when we call into
vg_open we don't have a vgid, so we fail early in lvmcache_get_vg and
thus we never add them to the cache and thus we never call dm_pool_lock
on them.

If you could point out the bit of code that actually determines that we
mprotect memory based on the user doing an vg open read only vs. read
write that would be most helpful.  I'm not seeing it at the moment and I
would like to understand this better.

Currently I am thinking about just backing out the change to retrieve
the pv list and use the cmd->mem for allocations instead of the
vg->vgmem.  Previously I brought up just putting such things on the heap
and letting the users of the library free them, but that was dismissed
because existing library users would then have memory leaks.

yeah there could be some design fault which is using mem pool from read-only
volume group. vg->vgmem is only used for allocation of objects directly related to VG - return of some copied buffers is not something which is supposed touch this memory pool as well as creating PV lists
(cmd mempool or just temporarily created new pool needs to be used here.)

If you hold only read lock, there should be no need to modify vg structure.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]