[dm-devel] [RFC PATCH 00/20] dm-crypt: parallel processing
Mike Snitzer
snitzer at redhat.com
Tue Aug 21 13:32:39 UTC 2012
[cc'ing Chris Mason]
On Tue, Aug 21 2012 at 5:08am -0400,
Milan Broz <mbroz at redhat.com> wrote:
> Hello,
>
> I promised to Mikulas that I will post remaining of his dm-crypt parallel
> processing patchset (plus some related changes) with some comments.
>
> The problem:
>
> The current implementation (using per cpu struct) always use encryption
> on the same CPU which submitted IO.
>
> With very fast storage (usually SSD or MD RAID0) one CPU core can
> be limiting and the throughput of encrypted disk is worse in
> comparison with underlying storage.
>
> Idea here is to distribute encryption to other CPU cores/CPUs.
>
> (Side effect of patches is nice clean up dmcrypt code. :)
>
> Mikulas Patocka (20):
> dm-crypt: remove per-cpu structure
> dm-crypt: use unbound workqueue for request processing
> dm-crypt: remove completion restart
> dm-crypt: use encryption threads
> dm-crypt: Unify spinlock
> dm-crypt: Introduce an option that sets the number of threads.
> dm-crypt: don't use write queue
> dm-crypt: simplify io queue
> dm-crypt: unify io_queue and crypt_queue
> dm-crypt: don't allocate pages for a partial request.
> dm-crypt: avoid deadlock in mempools
> dm-crypt: simplify cc_pending
> dm-crypt merge convert_context and dm_crypt_io
> dm-crypt: move error handling to crypt_convert.
> dm-crypt: remove io_pending field
> dm-crypt: small changes
> dm-crypt: move temporary values to stack
> dm-crypt: offload writes to thread
> dm-crypt: retain write ordering
> dm-crypt: sort writes
>
> drivers/md/dm-crypt.c | 838 +++++++++++++++++++++++++++----------------------
> 1 file changed, 464 insertions(+), 374 deletions(-)
>
>
> My notes:
>
> I extensively tested this (on top of 3.6.0-rc2) and while I like
> simplification and the main logic (if we have enough power why
> not use other CPUs) I see several problems here.
>
> 1) The implementation is not much useful on modern CPUs with hw accelerated
> AES (with AES-NI even one core can saturate very fast storage).
> (Some optimized crypto behaves similar, like twofish optimized modules.)
>
> 2) The patchset targets linear access pattern mainly and one
> process generating IOs. (With more processes/CPUs generating IOs
> you get parallel processing even with current code.)
>
> I can see significant improvement (~30%) for linear read
> (if not using AES-NI) and if underlying storage is zero target
> (ideal situation removing scheduler from the stack).
>
> For random access pattern (and more IO producers) I cannot find
> reliable improvement except ideal zero target case (where improvement
> is always >30%). For more producers it doesn't help at all.
> I tested RAID0 over SATA disks and very fast SSD on quad core cpu,
> dmcrypt running with 1, 3 or 4 threads (and with cfq and noop scheduler)
> using fio threaded test with 1 or 4 threads.
>
> Notes to implementation:
>
> 1) Last two patches (19/20) provides sorting of IO requests, this
> logically should be done in IO scheduler.
>
> I don't think this should be in dmcrypt, if scheduler doesn't work
> properly, it should be fixed or tuned for crypt access pattern.
>
> 2) Could be kernel workqueue used/fixed here instead? Basically all it needs
> is to prefer submitting CPU, if it is busy just move work to another CPU.
>
> 3) It doesn't honour CPU hotplugging. Number of CPU is determined
> in crypt constructor. If hotplugging is used for CPU power saving,
> it will not work properly.
>
> 4) With debugging options on, everything is extremely slower
> (Perhaps bug in some debug option, I just noticed this on Fedora
> rawhide with all debug on.)
>
>
> I posted it mainly because I think this should move forward,
> whatever direction...
I had a conversation with Chris Mason, maybe 2 years ago, where he
expressed concern about dm-crypt growing too much intelligence about
parallel cpu usage. I seem to recall Chris' concern being relative to
btrfs and the requirement that IOs not get reordered (his point was any
dm-crypt change to improve cpu utilization shouldn't result in the
possibility to reorder IOs).
Could be that IO reordering isn't a concern dm-crypt (before or after
this patchset). But I just wanted to make this point so that it is kept
in mind (also allows Chris to raise any other new dm-crypt issue/concern
he might have?).
Mike
More information about the dm-devel
mailing list