[dm-devel] dm-lc code/design [was: Re: dm-lc.c: fix for a potential NULL pointer dereference]

Thu Aug 1 12:45:50 UTC 2013

Thanks for your comment, Mike.

> DM device creation and deletion are done in terms of load (.ctr) and
> remove (.dtr).

In dm-lc, there are two actors.

One is lc_device which is truly a DM device.
It is first created as a linear-like device
simply backed by a backing store
and later attached to a lc_cache for caching to get started.
I/Os from the upper layer like filesystems are submitted to lc_device.

The another is lc_cache which is NOT a DM device.
It is just the context of a cache device.
resume_cache routine called by lc-resume-cache command written in Python
reads metadata regions on a cache device medium
and build an in-memory structure. That is lc_cache.

I am sorry to puzzle you.
I will make a slide to explain
how these structures are built and related.
I already made a slide to explain
how writes to lc_device are processed but
I don't think that is enough for people
who want to know how dm-lc is initialized either.

> And all these sysfs files are excessive too.  You'll notice that DM
devices
> don't expose discrete sysfs files on a per device basis.  All per-device
> info is exposed via .status

dm-lc gives ID numbers to both lc_device and lc_cache
and then manages sysfs under /sys/module/dm_lc like

root at Hercules:/sys/module/dm_lc# tree devices/ caches/
devices/
└── 5
    ├── cache_id
    ├── dev
    ├── device -> ../../../../devices/virtual/block/dm-0
    ├── migrate_threshold
    └── nr_dirty_caches
caches/
└── 3
    ├── allow_migrate
    ├── barrier_deadline_ms
    ├── commit_super_block
    ├── commit_super_block_interval
    ├── device -> ../../../../devices/virtual/block/dm-1
    ├── flush_current_buffer
    ├── flush_current_buffer_interval
    ├── force_migrate
    ├── last_flushed_segment_id
    ├── last_migrated_segment_id
    ├── nr_max_batched_migration
    └── update_interval

In the case above, lc_device with ID 5 and lc_cache with ID 3
are built on memory and the lc_device uses the lc_cache.

root at Hercules:/sys/module/dm_lc# cat devices/5/cache_id
3

I know that device-mapper can not establish a sysfs
for a DM device and that's why I elaborated this workaround.
All the sysfs are placed in a subtree looks manageable.

.status is used.
The commands below can show the status,
such as static information like memory consumption
and cache statistics of the cache device.
In my architecture,
sysfs is used for variables needed to control the module behavior and
status is used for otherwise.

root at Hercules:/sys/module/dm_lc# dmsetup message lc-mgr 0 switch_to 3
root at Hercules:/sys/module/dm_lc# dmsetup status lc-mgr 0
lc-mgr: 0 1 lc-mgr
current cache_id_ptr: 3
static RAM(approx.): 37056 (byte)
allow_migrate: 1
nr_segments: 3
last_migrated_segment_id: 403
last_flushed_segment_id: 403
current segment id: 404
cursor: 255

write? hit? on_buffer? fullsize?
0 0 0 0 0
1 0 0 0 0
0 1 0 0 0
1 1 0 0 0
0 0 1 0 0
1 0 1 0 0
0 1 1 0 0
1 1 1 0 0
0 0 0 1 83
1 0 0 1 0
0 1 0 1 0
1 1 0 1 0
0 0 1 1 0
1 0 1 1 0
0 1 1 1 0

> All said, I'll try to make time for a more formal review in the coming
> weeks.  Please feel free to ask more questions in the mean time.
Thanks,
I will do my best to help your code review.
I will make the said slide on this weekend and upload to my repo.
I am really looking forward to go through the code review.

Akira

On 8/1/13 12:57 AM, Mike Snitzer wrote:
> On Wed, Jul 31 2013 at  9:04am -0400,
> Akira Hayakawa <ruby.wktk at gmail.com> wrote:
> 
>> Thanks, Kumar.
>> Your patch is applied.
>>
>> resume_cache,
>> a routine to build in-memory data structures
>> by reading metadata on cache device,
>> is so complicated in the code and the logic
>> to thoroughly implement the error checks.
>>
>> I am wondering how I should face this problem.
>> Only caring about lines
>> that allocates large-sized memories
>> and forget about anything else
>> is what I am thinking now.
>> But it is clear that
>> it is not a way kernel module should be.
>>
>> Do you guys have some thoughts on this problem?
> 
> I had a quick look at "resume_cache".  I was thinking you were referring
> to the .resume method of the target.  The .resume method must not
> allocate _any_ memory (DM convention requires all memory allocations to
> be done in terms of preallocated memory or more commonly as part of the
> DM table load, via .ctr)... anyway ignoring that for now.
> 
> I was very surprised to see that you're managing devices in terms of DM
> messages like "resume_cache" (which I assume your dm-lc userspace tools
> initiate).  This seems wrong -- I'm also not seeing proper locking in
> the code either (which you get for free if with DM if you use the
> traditional DM hooks via target_type ops).  But I haven't been able to
> do a proper review.
> 
> DM device creation and deletion are done in terms of load (.ctr) and
> remove (.dtr).
> 
> And all these sysfs files are excessive too.  You'll notice that DM devices
> don't expose discrete sysfs files on a per device basis.  All per-device
> info is exposed via .status
> 
> We _could_ take steps to establish a per-DM-device sysfs namespace; but
> that would need to be something wired up to the DM core.  So I'd prefer
> dm-lc use traditional .status (info and table).
> 
> All said, I'll try to make time for a more formal review in the coming
> weeks.  Please feel free to ask more questions in the mean time.
> 
> Thanks,
> Mike
>