[Date Prev][Date Next] [Thread Prev][Thread Next]
[dm-devel] [PATCH 0/7] request-based dm-multipath (v2)
- From: Kiyoshi Ueda <k-ueda ct jp nec com>
- To: Alasdair Kergon <agk redhat com>
- Cc: Christof Schmitt <christof schmitt de ibm com>, device-mapper development <dm-devel redhat com>
- Subject: [dm-devel] [PATCH 0/7] request-based dm-multipath (v2)
- Date: Fri, 24 Apr 2009 17:10:49 +0900
This patch-set is the updated version of request-based dm-multipath
This patch-set doesn't have barrier support yet. Barrier support
for request-based dm will be posted as separate patches in the future.
The patch-set has some cosmetic dependencies on the previously posted
dynamic load balancer patches:
but no functional dependency on them. (Actually, dynamic load
balancers rely on this patch-set for effectiveness.)
Some basic function/performance testings are done with NEC iStorage
(active-active multipath), and no problem was found.
Also heavy stress testing with a lot of path up/down and table
switching is done to check panic, stall and memory leak don't occur.
Please review and apply if no problem.
Summary of the patch-set
1/7: dm core: add core functions for request-based dm
2/7: dm core: add integrity feature to request-based dm
3/7: dm core: enable request-based dm
4/7: dm core: don't set QUEUE_ORDERED_DRAIN for request-based dm
5/7: dm core: reject I/O violating new queue limits
6/7: dm core: disable interrupt when taking map_lock
7/7: dm-mpath: convert to request-based
drivers/md/dm-ioctl.c | 13
drivers/md/dm-mpath.c | 193 +++++---
drivers/md/dm-table.c | 127 +++++
drivers/md/dm.c | 978 ++++++++++++++++++++++++++++++++++++++++--
drivers/md/dm.h | 22
include/linux/device-mapper.h | 10
6 files changed, 1242 insertions(+), 101 deletions(-)
- Rebased to 2.6.30-rc3
- Fixed memory leak when bio_integrity_clone() fails (PATCH 2)
* Added missing bio_free() when bio_integrity_clone() fails.
- Added a patch not to set QUEUE_ORDERED_DRAIN for request-based dm
* This version of request-based dm doesn't have barrier support
yet. So set QUEUE_ORDERED_DRAIN only for bio-based dm.
Since the device type is decided at the first table binding
time, the flag set is deferred until then.
- Rebased to 2.6.29-rc8
- Fixed oops on table swapping because of broken suspend (PATCH 1)
* dm_suspend() recognized that suspend completed while some
clones were still in flight. So swapping/freeing table
was possible against the in-use table, and it caused oops
* The problem was a race condition on in-flight checking:
new I/O could become in-flight while suspend code found
no I/O was in-flight and went on
* Fixed it by protecting the in-flight counting and queue
suspension with queue lock. Also using q->in_flight
instead of md->pending since it is straightforward to
understand the meaning of the in-flight clones
- Fixed NULL pointer reference when allocation for clone bio fails
(From Hannes Reinecke) (PATCH 1)
* free_bio_clone() used clone->end_io_data to get md
* But clone->end_io_data can be NULL when free_bio_clone()
is called from error handling path in clone_request_bios()
* Fixed it by passing md directly to free_bio_clone()
* Removed the work around code to check bio->bi_private:
- Fixed unnecessary -EIO on table swapping by removing
the map-existence check in dm_make_request() (PATCH 1)
* The check was there to reject BIOs until a table is loaded
and the device and the queue are initialized correctly
* But the check is not needed because BIOs are rejected
in __generic_make_request() by device size checking
until the device and the queue are initialized
- Fixed freeing of in-use md (PATCH 1)
* Upper layer can close the device just after all BIOs are
completed by blk_update_request() or blk_end_request(),
even if the request is still in the process of completion.
It creates a small window where dm_put() can free the md
which is needed for the process of completion
* Fixed it by incrementing md->holders while I/O is in-flight
- Fixed memory leak (missing unmap) on failure of dispatching
a mapped clone (PATCH 1)
* dm_kill_request() was used in dm_dispatch_request() when
the dispatching fails
* But dm_kill_requst() is for not-mapped clone, so target's
rq_end_io() function is not called. This caused memory leak.
* Fixed it by creating dm_complete_request() for mapped clone
and calling it in dm_dispatch_request()
- Changed exported function names to prevent misuse like above
(PATCH 1, PATCH 3, PATCH 5)
* dm_kill_request() => dm_kill_unmapped_request()
* dm_requeue_request() => dm_requeue_unmapped_request()
- Removed the REQ_QUIET flag in dm_kill_unmapped_request(), since
"I/O error" message is not printed for failed I/O (PATCH 1)
- Removed 'inline' for simple static functions, since compiler
will do inline anyway (PATCH 1)
- Removed if-request-based check in dm_prep_fn(), since the device
is always request-based when dm_prep_fn() is called (PATCH 1)
* The check was for the case where the table is switched from
request-based to bio-based. But currently no plan to support
- Fixed the problem that switching device type is unexpectedly
available on table swapping (PATCH 2)
* table_load() checks md->map for the current table type
to prevent the device type switching
* But table_load() can run in parallel with dm_swap_table(),
which has a small no-map window.
So dm_init_md_mempool() in table_load() could switch
the mempool type for an in-use device in this window
* Fixed it by placing the pre-allocated mempools in table
and binding them on resume. The resume is rejected if
the device is in-use with different type
- Fixed the problem that BIO is submitted to a request-based device
in which some settings may not be done yet (PATCH 2)
* The QUEUE_FLAG_STACKABLE flag was set in the middle of
* But the flag needs to be set after all queue settings are
done and they become visible to other CPUs, because:
o BIOs can be submitted during dm_table_set_restrictions()
o Once the QUEUE_FLAG_STACKABLE flag is set in the queue,
BIOs are passed to request-based dm and eventually to
__make_request() where the queue settings are used
* Fixed it by re-ordering the flag set and putting a barrier
before the flag set.
- Fixed lockdep warnings (From Christof Schmitt) (PATCH 4)
* lockdep warns some self-deadlock possibilities since:
o map_lock is taken without disabling irq
o request-based dm takes map_lock after taking
queue_lock with disabling irq in dm_request_fn()
o so logically a deadlock may happen:
* dm_request_fn() is never called in interrupt context, so
such self-deadlocks don't happen actually
* But fixed the unneeded warnings by disabling irq when
Summary of the design and request-based dm-multipath are below.
(No changes since the previous post.)
Currently, device-mapper (dm) is implemented as a stacking block device
at bio level. This bio-based implementation has an issue below
Because hook for I/O mapping is above block layer __make_request(),
contiguous bios can be mapped to different underlying devices
and these bios aren't merged into a request.
Dynamic load balancing could happen this situation, though
it has not been implemented yet.
Therefore, I/O mapping after bio merging is needed for better
dynamic load balancing.
The basic idea to resolve the issue is to move multipathing layer
down below the I/O scheduler, and it was proposed from Mike Christie
as the block layer (request-based) multipath:
Mike's patch added new block layer device for multipath and didn't
have dm interface. So I modified his patch to be used from dm.
It is request-based dm-multipath.
While currently dm and md stacks block devices at bio level,
request-based dm stacks at request level and submits/completes
struct request instead of struct bio.
Overview of the request-based dm patch:
- Mapping is done in a unit of struct request, instead of struct bio
- Hook for I/O mapping is at q->request_fn() after merging and
sorting by I/O scheduler, instead of q->make_request_fn().
- Hook for I/O completion is at bio->bi_end_io() and rq->end_io(),
instead of only bio->bi_end_io()
bio-based (current) request-based (this patch)
submission q->make_request_fn() q->request_fn()
completion bio->bi_end_io() bio->bi_end_io(), rq->end_io()
- Whether the dm device is bio-based or request-based is determined
at table loading time
- Keep user interface same (table/message/status/ioctl)
- Any bio-based devices (like dm/md) can be stacked on request-based
Request-based dm device *cannot* be stacked on any bio-based device.
- better load balancing
Why does request-based dm use bio->bi_end_io(), too?
- dm needs to keep not only the request but also bios of the request,
if dm target drivers want to retry or do something on the request.
For example, dm-multipath has to check errors and retry with other
paths if necessary before returning the I/O result to the upper layer.
- But rq->end_io() is called at the very late stage of completion
handling where all bios in the request have been completed and
the I/O results are already visible to the upper layer.
So request-based dm hooks bio->bi_end_io() and doesn't complete the bio
in error cases, and gives over the error handling to rq->end_io() hook.
[Date Prev][Date Next] [Thread Prev][Thread Next]