[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] [PATCH] dm: verity target



Hi all!

I was just curious if there is any interest in pulling this change, or
if not, if there is any particular set of concerns, fixes, etc.  I
realize it's not a small amount of code to digest (though it is
smaller than the post from last year[1]).   Would re-posting with an
added blob explaining the name be useful, or, perhaps, a name change,
or is there anything further that would be beneficial to
consideration?  Jonathan Corbet was kind enough to wade through the
docs and code to write an article[2] which may help.  Additionally,
Mandeep and I presented[3] at the Security Summit and the Filesystems
track of Plumbers on the topic which I hope helped show the value of
this patch (everything from layering with EVM to providing tboot users
with a fast, efficient way to verify their system images without
requiring immutable media).

As usual, any and all guidance/feedback/flames will be appreciated - thanks!
will


1 - http://thread.gmane.org/gmane.linux.kernel/989307
2 - http://lwn.net/Articles/459420/
3 - http://selinuxproject.org/~jmorris/lss2011_slides/LSS_11_Integrity_checked_block_devices.pdf

On Thu, Sep 15, 2011 at 1:45 PM, Mandeep Singh Baines <msb chromium org> wrote:
> The verity target provides transparent integrity checking of block devices
> using a cryptographic digest.
>
> dm-verity is meant to be setup as part of a verified boot path.  This
> may be anything ranging from a boot using tboot or trustedgrub to just
> booting from a known-good device (like a USB drive or CD).
>
> dm-verity is part of ChromeOS's verified boot path. It is used to verify
> the integrity of the root filesystem on boot. The root filesystem is
> mounted on a dm-verity partition which transparently verifies each block
> with a bootloader verified hash passed into the kernel at boot.
>
> Signed-off-by: Will Drewry <wad chromium org>
> Signed-off-by: Elly Jones <ellyjones chromium org>
> Signed-off-by: Mandeep Singh Baines <msb chromium org>
> Cc: Alasdair G Kergon <agk redhat com>
> Cc: Milan Broz <mbroz redhat com>
> Cc: Olof Johansson <olofj chromium org>
> Cc: dm-devel redhat com
> Cc: linux-kernel vger kernel org
> ---
>  Documentation/device-mapper/dm-bht.txt    |   59 ++
>  Documentation/device-mapper/dm-verity.txt |   76 +++
>  drivers/md/Kconfig                        |   30 +
>  drivers/md/Makefile                       |    2 +
>  drivers/md/dm-bht.c                       |  541 +++++++++++++++
>  drivers/md/dm-verity.c                    | 1043 +++++++++++++++++++++++++++++
>  drivers/md/dm-verity.h                    |   45 ++
>  include/linux/dm-bht.h                    |  166 +++++
>  8 files changed, 1962 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/device-mapper/dm-bht.txt
>  create mode 100644 Documentation/device-mapper/dm-verity.txt
>  create mode 100644 drivers/md/dm-bht.c
>  create mode 100644 drivers/md/dm-verity.c
>  create mode 100644 drivers/md/dm-verity.h
>  create mode 100644 include/linux/dm-bht.h
>
> diff --git a/Documentation/device-mapper/dm-bht.txt b/Documentation/device-mapper/dm-bht.txt
> new file mode 100644
> index 0000000..21d929f
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-bht.txt
> @@ -0,0 +1,59 @@
> +dm-bht
> +======
> +
> +dm-bht provides a block hash tree implementation.  The use of dm-bht allows
> +for integrity checking of a given block device without reading the entire
> +set of blocks into memory before use.
> +
> +In particular, dm-bht supplies an interface for creating and verifying a tree
> +of cryptographic digests with any algorithm supported by the kernel crypto API.
> +
> +The `verity' target is the motivating example.
> +
> +
> +Theory of operation
> +===================
> +
> +dm-bht is logically comprised of multiple nodes organized in a tree-like
> +structure.  Each node in the tree is a cryptographic hash.  If it is a leaf
> +node, the hash is of some block data on disk.  If it is an intermediary node,
> +then the hash is of a number of child nodes.
> +
> +dm-bht has a given depth starting at 1 (ignoring the root node).  Each level in
> +the tree is concretely made up of dm_bht_entry structs.  Each entry in the tree
> +is a collection of neighboring nodes that fit in one page-sized block.  The
> +number is determined based on PAGE_SIZE and the size of the selected
> +cryptographic digest algorithm.  The hashes are linearly ordered in this entry
> +and any unaligned trailing space is ignored but included when calculating the
> +parent node.
> +
> +The tree looks something like:
> +
> +alg= sha256, num_blocks = 32767
> +                                 [   root    ]
> +                                /    . . .    \
> +                     [entry_0]                 [entry_1]
> +                    /  . . .  \                 . . .   \
> +         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
> +           / ... \             /   . . .  \             /           \
> +     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
> +
> +root is treated independently from the depth and the blocks are expected to
> +be hashed and supplied to the dm-bht.  hash blocks that make up the entry
> +contents are expected to be read from disk.
> +
> +dm-bht does not handle I/O directly but instead expects the consumer to
> +supply callbacks.  The read callback will always receive a page-align value
> +to pass to the block device layer to read in a hash value.
> +
> +Usage
> +=====
> +
> +The API provides mechanisms for reading and verifying a tree. When reading, all
> +required data for the hash tree should be populated for a block before
> +attempting a verify.  This can be done by calling dm_bht_populate().  When all
> +data is ready, a call to dm_bht_verify_block() with the expected hash value will
> +perform both the direct block hash check and the hashes of the parent and
> +neighboring nodes where needed to ensure validity up to the root hash.  Note,
> +dm_bht_set_root_hexdigest() should be called before any verification attempts
> +occur.
> diff --git a/Documentation/device-mapper/dm-verity.txt b/Documentation/device-mapper/dm-verity.txt
> new file mode 100644
> index 0000000..f33b984
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-verity.txt
> @@ -0,0 +1,76 @@
> +dm-verity
> +==========
> +
> +Device-Mapper's "verity" target provides transparent integrity checking of
> +block devices using a cryptographic digest provided by the kernel crypto API.
> +This target is read-only.
> +
> +Parameters: payload=<device path> hashtree=<hash device path> alg=<alg> \
> +            salt=<salt> root_hexagiest=<root hash> \
> +            [ hashstart=<hash start> error_behavior=<error behavior> ]
> +
> +<device path>
> +    This is the device that is going to be integrity checked.  It may be
> +    a subset of the full device as specified to dmsetup (start sector and count)
> +    It may be specified as a path, like /dev/sdaX, or a device number,
> +    <major>:<minor>.
> +
> +<hash device path>
> +    This is the device that that supplies the dm-bht hash data.  It may be
> +    specified similarly to the device path and may be the same device.  If the
> +    same device is used, the hash offset should be outside of the dm-verity
> +    configured device size.
> +
> +<alg>
> +    The cryptographic hash algorithm used for this device.  This should
> +    be the name of the algorithm, like "sha1".
> +
> +<salt>
> +    Salt value (in hex).
> +
> +<root hash>
> +    The hexadecimal encoding of the cryptographic hash of all of the
> +    neighboring nodes at the first level of the tree.  This hash should be
> +    trusted as there is no other authenticity beyond this point.
> +
> +<hash start>
> +    Start address of hashes (default 0).
> +
> +<error behavior>
> +    0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier.
> +
> +Theory of operation
> +===================
> +
> +dm-verity is meant to be setup as part of a verified boot path.  This
> +may be anything ranging from a boot using tboot or trustedgrub to just
> +booting from a known-good device (like a USB drive or CD).
> +
> +When a dm-verity device is configured, it is expected that the caller
> +has been authenticated in some way (cryptographic signatures, etc).
> +After instantiation, all hashes will be verified on-demand during
> +disk access.  If they cannot be verified up to the root node of the
> +tree, the root hash, then the I/O will fail.  This should identify
> +tampering with any data on the device and the hash data.
> +
> +Cryptographic hashes are used to assert the integrity of the device on a
> +per-block basis.  This allows for a lightweight hash computation on first read
> +into the page cache.  Block hashes are stored linearly aligned to the nearest
> +block the size of a page.
> +
> +For more information on the hashing process, see dm-bht.txt.
> +
> +
> +Example
> +=======
> +
> +Setup a device;
> +[[
> +  dmsetup create vroot --table \
> +    "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\
> +    "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727"
> +]]
> +
> +A command line tool is available to compute the hash tree and return the
> +root hash value.
> +  http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index f75a66e..cb5f425 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -334,4 +334,34 @@ config DM_FLAKEY
>        ---help---
>          A target that intermittently fails I/O for debugging purposes.
>
> +config DM_BHT
> +        tristate "Block hash tree support"
> +        select CRYPTO
> +        select CRYPTO_HASH
> +        ---help---
> +          Include support for device-mapper devices to use a block hash
> +          tree for managing data integrity checks in a scalable way.
> +
> +          Targets that use this functionality should include it
> +          automatically.
> +
> +          If unsure, say N.
> +
> +config DM_VERITY
> +        tristate "Verity target support"
> +        depends on BLK_DEV_DM
> +        select DM_BHT
> +        select CRYPTO
> +        select CRYPTO_HASH
> +        ---help---
> +          This device-mapper target allows you to create a device that
> +          transparently integrity checks the data on it. You'll need to
> +          activate the digests you're going to use in the cryptoapi
> +          configuration.
> +
> +          To compile this code as a module, choose M here: the module will
> +          be called dm-verity.
> +
> +          If unsure, say N.
> +
>  endif # MD
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index 448838b..58eb088 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -36,6 +36,8 @@ obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
>  obj-$(CONFIG_DM_SNAPSHOT)      += dm-snapshot.o
>  obj-$(CONFIG_DM_MIRROR)                += dm-mirror.o dm-log.o dm-region-hash.o
>  obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
> +obj-$(CONFIG_DM_BHT)            += dm-bht.o
> +obj-$(CONFIG_DM_VERITY)         += dm-verity.o
>  obj-$(CONFIG_DM_ZERO)          += dm-zero.o
>  obj-$(CONFIG_DM_RAID)  += dm-raid.o
>
> diff --git a/drivers/md/dm-bht.c b/drivers/md/dm-bht.c
> new file mode 100644
> index 0000000..32b8ccf
> --- /dev/null
> +++ b/drivers/md/dm-bht.c
> @@ -0,0 +1,541 @@
> + /*
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev chromium org>
> + *
> + * Device-Mapper block hash tree interface.
> + * See Documentation/device-mapper/dm-bht.txt for details.
> + *
> + * This file is released under the GPLv2.
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/cpumask.h>
> +#include <linux/device-mapper.h>
> +#include <linux/dm-bht.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/kernel.h>
> +#include <linux/mm_types.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +
> +#define DM_MSG_PREFIX "dm bht"
> +
> +
> +/*
> + * Utilities
> + */
> +
> +static u8 from_hex(u8 ch)
> +{
> +       if ((ch >= '0') && (ch <= '9'))
> +               return ch - '0';
> +       if ((ch >= 'a') && (ch <= 'f'))
> +               return ch - 'a' + 10;
> +       if ((ch >= 'A') && (ch <= 'F'))
> +               return ch - 'A' + 10;
> +       return -1;
> +}
> +
> +/**
> + * dm_bht_bin_to_hex - converts a binary stream to human-readable hex
> + * @binary:    a byte array of length @binary_len
> + * @hex:       a byte array of length @binary_len * 2 + 1
> + */
> +static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len)
> +{
> +       while (binary_len-- > 0) {
> +               sprintf((char *)hex, "%02hhx", (int)*binary);
> +               hex += 2;
> +               binary++;
> +       }
> +}
> +
> +/**
> + * dm_bht_hex_to_bin - converts a hex stream to binary
> + * @binary:    a byte array of length @binary_len
> + * @hex:       a byte array of length @binary_len * 2 + 1
> + */
> +static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex,
> +                             unsigned int binary_len)
> +{
> +       while (binary_len-- > 0) {
> +               *binary = from_hex(*(hex++));
> +               *binary *= 16;
> +               *binary += from_hex(*(hex++));
> +               binary++;
> +       }
> +}
> +
> +static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed)
> +{
> +       u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
> +       u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1];
> +
> +       dm_bht_bin_to_hex(given, given_hex, bht->digest_size);
> +       dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size);
> +       DMERR_LIMIT("%s != %s", given_hex, computed_hex);
> +}
> +
> +/**
> + * dm_bht_compute_hash: hashes a page of data
> + */
> +static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg,
> +                              unsigned int offset, u8 *digest)
> +{
> +       struct hash_desc *hash_desc = &bht->hash_desc[smp_processor_id()];
> +       struct scatterlist sg;
> +
> +       sg_init_table(&sg, 1);
> +       sg_set_page(&sg, pg, bht->block_size, offset);
> +       /* Note, this is synchronous. */
> +       if (crypto_hash_init(hash_desc)) {
> +               DMCRIT("failed to reinitialize crypto hash (proc:%d)",
> +                       smp_processor_id());
> +               return -EINVAL;
> +       }
> +       if (crypto_hash_update(hash_desc, &sg, bht->block_size)) {
> +               DMCRIT("crypto_hash_update failed");
> +               return -EINVAL;
> +       }
> +       sg_set_buf(&sg, bht->salt, sizeof(bht->salt));
> +       if (crypto_hash_update(hash_desc, &sg, sizeof(bht->salt))) {
> +               DMCRIT("crypto_hash_update failed");
> +               return -EINVAL;
> +       }
> +       if (crypto_hash_final(hash_desc, digest)) {
> +               DMCRIT("crypto_hash_final failed");
> +               return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +/*
> + * Implementation functions
> + */
> +
> +static int dm_bht_initialize_entries(struct dm_bht *bht)
> +{
> +       /* last represents the index of the last digest store in the tree.
> +        * By walking the tree with that index, it is possible to compute the
> +        * total number of entries at each level.
> +        *
> +        * Since each entry will contain up to |node_count| nodes of the tree,
> +        * it is possible that the last index may not be at the end of a given
> +        * entry->nodes.  In that case, it is assumed the value is padded.
> +        *
> +        * Note, we treat both the tree root (1 hash) and the tree leaves
> +        * independently from the bht data structures.  Logically, the root is
> +        * depth=-1 and the block layer level is depth=bht->depth
> +        */
> +       unsigned int last = bht->block_count;
> +       int depth;
> +
> +       /* check that the largest level->count can't result in an int overflow
> +        * on allocation or sector calculation.
> +        */
> +       if (((last >> bht->node_count_shift) + 1) >
> +           UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry),
> +                          (unsigned int)to_sector(bht->block_size))) {
> +               DMCRIT("required entries %u is too large", last + 1);
> +               return -EINVAL;
> +       }
> +
> +       /* Track the current sector location for each level so we don't have to
> +        * compute it during traversals.
> +        */
> +       bht->sectors = 0;
> +       for (depth = 0; depth < bht->depth; ++depth) {
> +               struct dm_bht_level *level = &bht->levels[depth];
> +
> +               level->count = dm_bht_index_at_level(bht, depth, last) + 1;
> +               level->entries = (struct dm_bht_entry *)
> +                                kcalloc(level->count,
> +                                        sizeof(struct dm_bht_entry),
> +                                        GFP_KERNEL);
> +               if (!level->entries) {
> +                       DMERR("failed to allocate entries for depth %d", depth);
> +                       return -ENOMEM;
> +               }
> +               level->sector = bht->sectors;
> +               bht->sectors += level->count * to_sector(bht->block_size);
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * dm_bht_create - prepares @bht for us
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @depth:     tree depth without the root; including block hashes
> + * @block_count:the number of block hashes / tree leaves
> + * @alg_name:  crypto hash algorithm name
> + *
> + * Returns 0 on success.
> + *
> + * Callers can offset into devices by storing the data in the io callbacks.
> + */
> +int dm_bht_create(struct dm_bht *bht, unsigned int block_count,
> +                 unsigned int block_size, const char *alg_name)
> +{
> +       int cpu, status;
> +
> +       bht->block_size = block_size;
> +       /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */
> +       if ((block_size > PAGE_SIZE) ||
> +           (PAGE_SIZE % block_size) ||
> +           (to_sector(block_size) == 0))
> +               return -EINVAL;
> +
> +       /* Setup the hash first. Its length determines much of the bht layout */
> +       for (cpu = 0; cpu < nr_cpu_ids; ++cpu) {
> +               bht->hash_desc[cpu].tfm = crypto_alloc_hash(alg_name, 0, 0);
> +               if (IS_ERR(bht->hash_desc[cpu].tfm)) {
> +                       DMERR("failed to allocate crypto hash '%s'", alg_name);
> +                       status = -ENOMEM;
> +                       bht->hash_desc[cpu].tfm = NULL;
> +                       goto bad_arg;
> +               }
> +       }
> +       bht->digest_size = crypto_hash_digestsize(bht->hash_desc[0].tfm);
> +       /* We expect to be able to pack >=2 hashes into a block */
> +       if (block_size / bht->digest_size < 2) {
> +               DMERR("too few hashes fit in a block");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) {
> +               DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       /* Configure the tree */
> +       bht->block_count = block_count;
> +       if (block_count == 0) {
> +               DMERR("block_count must be non-zero");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       /* Each dm_bht_entry->nodes is one block.  The node code tracks
> +        * how many nodes fit into one entry where a node is a single
> +        * hash (message digest).
> +        */
> +       bht->node_count_shift = fls(block_size / bht->digest_size) - 1;
> +       /* Round down to the nearest power of two.  This makes indexing
> +        * into the tree much less painful.
> +        */
> +       bht->node_count = 1 << bht->node_count_shift;
> +
> +       /* This is unlikely to happen, but with 64k pages, who knows. */
> +       if (bht->node_count > UINT_MAX / bht->digest_size) {
> +               DMERR("node_count * hash_len exceeds UINT_MAX!");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift);
> +
> +       /* Ensure that we can safely shift by this value. */
> +       if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) {
> +               DMERR("specified depth and node_count_shift is too large");
> +               status = -EINVAL;
> +               goto bad_arg;
> +       }
> +
> +       /* Allocate levels. Each level of the tree may have an arbitrary number
> +        * of dm_bht_entry structs.  Each entry contains node_count nodes.
> +        * Each node in the tree is a cryptographic digest of either node_count
> +        * nodes on the subsequent level or of a specific block on disk.
> +        */
> +       bht->levels = (struct dm_bht_level *)
> +                       kcalloc(bht->depth,
> +                               sizeof(struct dm_bht_level), GFP_KERNEL);
> +       if (!bht->levels) {
> +               DMERR("failed to allocate tree levels");
> +               status = -ENOMEM;
> +               goto bad_level_alloc;
> +       }
> +
> +       bht->read_cb = NULL;
> +
> +       status = dm_bht_initialize_entries(bht);
> +       if (status)
> +               goto bad_entries_alloc;
> +
> +       /* We compute depth such that there is only be 1 block at level 0. */
> +       BUG_ON(bht->levels[0].count != 1);
> +
> +       return 0;
> +
> +bad_entries_alloc:
> +       while (bht->depth-- > 0)
> +               kfree(bht->levels[bht->depth].entries);
> +       kfree(bht->levels);
> +bad_level_alloc:
> +bad_arg:
> +       for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
> +               if (bht->hash_desc[cpu].tfm)
> +                       crypto_free_hash(bht->hash_desc[cpu].tfm);
> +       return status;
> +}
> +EXPORT_SYMBOL(dm_bht_create);
> +
> +/**
> + * dm_bht_read_completed
> + * @entry:     pointer to the entry that's been loaded
> + * @status:    I/O status. Non-zero is failure.
> + * MUST always be called after a read_cb completes.
> + */
> +void dm_bht_read_completed(struct dm_bht_entry *entry, int status)
> +{
> +       if (status) {
> +               /* TODO(wad) add retry support */
> +               DMCRIT("an I/O error occurred while reading entry");
> +               atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO);
> +               /* entry->nodes will be freed later */
> +               return;
> +       }
> +       BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING);
> +       atomic_set(&entry->state, DM_BHT_ENTRY_READY);
> +}
> +EXPORT_SYMBOL(dm_bht_read_completed);
> +
> +/**
> + * dm_bht_verify_block - checks that all nodes in the path for @block are valid
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @block:     specific block data is expected from
> + * @pg:                page holding the block data
> + * @offset:    offset into the page
> + *
> + * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error.
> + */
> +int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
> +                       struct page *pg, unsigned int offset)
> +{
> +       int state, depth = bht->depth;
> +       u8 digest[DM_BHT_MAX_DIGEST_SIZE];
> +       struct dm_bht_entry *entry;
> +       void *node;
> +
> +       do {
> +               /* Need to check that the hash of the current block is accurate
> +                * in its parent.
> +                */
> +               entry = dm_bht_get_entry(bht, depth - 1, block);
> +               state = atomic_read(&entry->state);
> +               /* This call is only safe if all nodes along the path
> +                * are already populated (i.e. READY) via dm_bht_populate.
> +                */
> +               BUG_ON(state < DM_BHT_ENTRY_READY);
> +               node = dm_bht_get_node(bht, entry, depth, block);
> +
> +               if (dm_bht_compute_hash(bht, pg, offset, digest) ||
> +                   memcmp(digest, node, bht->digest_size))
> +                       goto mismatch;
> +
> +               /* Keep the containing block of hashes to be verified in the
> +                * next pass.
> +                */
> +               pg = virt_to_page(entry->nodes);
> +               offset = offset_in_page(entry->nodes);
> +       } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED);
> +
> +       if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) {
> +               if (dm_bht_compute_hash(bht, pg, offset, digest) ||
> +                   memcmp(digest, bht->root_digest, bht->digest_size))
> +                       goto mismatch;
> +               atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
> +       }
> +
> +       /* Mark path to leaf as verified. */
> +       for (depth++; depth < bht->depth; depth++) {
> +               entry = dm_bht_get_entry(bht, depth, block);
> +               /* At this point, entry can only be in VERIFIED or READY state.
> +                * So it is safe to use atomic_set instead of atomic_cmpxchg.
> +                */
> +               atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED);
> +       }
> +
> +       return 0;
> +
> +mismatch:
> +       DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)",
> +                   depth, block);
> +       dm_bht_log_mismatch(bht, node, digest);
> +       return DM_BHT_ENTRY_ERROR_MISMATCH;
> +}
> +EXPORT_SYMBOL(dm_bht_verify_block);
> +
> +/**
> + * dm_bht_is_populated - check that entries from disk needed to verify a given
> + *                       block are all ready
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @block:     specific block data is expected from
> + *
> + * Callers may wish to call dm_bht_is_populated() when checking an io
> + * for which entries were already pending.
> + */
> +bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block)
> +{
> +       int depth;
> +
> +       for (depth = bht->depth - 1; depth >= 0; depth--) {
> +               struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
> +                                                             block);
> +               if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY)
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +EXPORT_SYMBOL(dm_bht_is_populated);
> +
> +/**
> + * dm_bht_populate - reads entries from disk needed to verify a given block
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @ctx:        context used for all read_cb calls on this request
> + * @block:     specific block data is expected from
> + *
> + * Returns negative value on error. Returns 0 on success.
> + */
> +int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block)
> +{
> +       int depth, state;
> +
> +       BUG_ON(block >= bht->block_count);
> +
> +       for (depth = bht->depth - 1; depth >= 0; --depth) {
> +               unsigned int index = dm_bht_index_at_level(bht, depth, block);
> +               struct dm_bht_level *level = &bht->levels[depth];
> +               struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth,
> +                                                             block);
> +               state = atomic_cmpxchg(&entry->state,
> +                                      DM_BHT_ENTRY_UNALLOCATED,
> +                                      DM_BHT_ENTRY_PENDING);
> +               if (state == DM_BHT_ENTRY_VERIFIED)
> +                       break;
> +               if (state <= DM_BHT_ENTRY_ERROR)
> +                       goto error_state;
> +               if (state != DM_BHT_ENTRY_UNALLOCATED)
> +                       continue;
> +
> +               /* Current entry is claimed for allocation and loading */
> +               entry->nodes = kmalloc(bht->block_size, GFP_NOIO);
> +               if (!entry->nodes)
> +                       goto nomem;
> +
> +               bht->read_cb(ctx,
> +                            level->sector + to_sector(index * bht->block_size),
> +                            entry->nodes, to_sector(bht->block_size), entry);
> +       }
> +
> +       return 0;
> +
> +error_state:
> +       DMCRIT("block %u at depth %d is in an error state", block, depth);
> +       return -EPERM;
> +
> +nomem:
> +       DMCRIT("failed to allocate memory for entry->nodes");
> +       return -ENOMEM;
> +}
> +EXPORT_SYMBOL(dm_bht_populate);
> +
> +/**
> + * dm_bht_destroy - cleans up all memory used by @bht
> + * @bht:       pointer to a dm_bht_create()d bht
> + */
> +void dm_bht_destroy(struct dm_bht *bht)
> +{
> +       int depth, cpu;
> +
> +       for (depth = 0; depth < bht->depth; depth++) {
> +               struct dm_bht_entry *entry = bht->levels[depth].entries;
> +               struct dm_bht_entry *entry_end = entry +
> +                                                bht->levels[depth].count;
> +               for (; entry < entry_end; ++entry)
> +                       kfree(entry->nodes);
> +               kfree(bht->levels[depth].entries);
> +       }
> +       kfree(bht->levels);
> +       for (cpu = 0; cpu < nr_cpu_ids; ++cpu)
> +               if (bht->hash_desc[cpu].tfm)
> +                       crypto_free_hash(bht->hash_desc[cpu].tfm);
> +}
> +EXPORT_SYMBOL(dm_bht_destroy);
> +
> +/*
> + * Accessors
> + */
> +
> +/**
> + * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @hexdigest: array of u8s containing the new digest in binary
> + * Returns non-zero on error.  hexdigest should be NUL terminated.
> + */
> +int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest)
> +{
> +       /* Make sure we have at least the bytes expected */
> +       if (strnlen((char *)hexdigest, bht->digest_size * 2) !=
> +           bht->digest_size * 2) {
> +               DMERR("root digest length does not match hash algorithm");
> +               return -1;
> +       }
> +       dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size);
> +       return 0;
> +}
> +EXPORT_SYMBOL(dm_bht_set_root_hexdigest);
> +
> +/**
> + * dm_bht_root_hexdigest - returns root digest in hex
> + * @bht:       pointer to a dm_bht_create()d bht
> + * @hexdigest: u8 array of size @available
> + * @available: must be bht->digest_size * 2 + 1
> + */
> +int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available)
> +{
> +       if (available < 0 ||
> +           ((unsigned int) available) < bht->digest_size * 2 + 1) {
> +               DMERR("hexdigest has too few bytes available");
> +               return -EINVAL;
> +       }
> +       dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size);
> +       return 0;
> +}
> +EXPORT_SYMBOL(dm_bht_root_hexdigest);
> +
> +/**
> + * dm_bht_set_salt - sets the salt used, in hex
> + * @bht:      pointer to a dm_bht_create()d bht
> + * @hexsalt:  salt string, as hex; will be zero-padded or truncated to
> + *            DM_BHT_SALT_SIZE * 2 hex digits.
> + */
> +void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt)
> +{
> +       size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt));
> +
> +       memset(bht->salt, 0, sizeof(bht->salt));
> +       dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen);
> +}
> +EXPORT_SYMBOL(dm_bht_set_salt);
> +
> +/**
> + * dm_bht_salt - returns the salt used, in hex
> + * @bht:      pointer to a dm_bht_create()d bht
> + * @hexsalt:  buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1.
> + */
> +int dm_bht_salt(struct dm_bht *bht, char *hexsalt)
> +{
> +       dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt));
> +       return 0;
> +}
> +EXPORT_SYMBOL(dm_bht_salt);
> +
> diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
> new file mode 100644
> index 0000000..a9bd0e8
> --- /dev/null
> +++ b/drivers/md/dm-verity.c
> @@ -0,0 +1,1043 @@
> +/*
> + * Originally based on dm-crypt.c,
> + * Copyright (C) 2003 Christophe Saout <christophe saout de>
> + * Copyright (C) 2004 Clemens Fruhwirth <clemens endorphin org>
> + * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved.
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev chromium org>
> + *                    All Rights Reserved.
> + *
> + * This file is released under the GPLv2.
> + *
> + * Implements a verifying transparent block device.
> + * See Documentation/device-mapper/dm-verity.txt
> + */
> +#include <linux/async.h>
> +#include <linux/atomic.h>
> +#include <linux/bio.h>
> +#include <linux/blkdev.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/genhd.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mempool.h>
> +#include <linux/mm_types.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +#include <linux/device-mapper.h>
> +#include <linux/dm-bht.h>
> +
> +#include "dm-verity.h"
> +
> +#define DM_MSG_PREFIX "verity"
> +
> +/* Supports up to 512-bit digests */
> +#define VERITY_MAX_DIGEST_SIZE 64
> +
> +/* TODO(wad) make both of these report the error line/file to a
> + *           verity_bug function.
> + */
> +#define VERITY_BUG(msg...) BUG()
> +#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond)
> +
> +/* Helper for printing sector_t */
> +#define ULL(x) ((unsigned long long)(x))
> +
> +#define MIN_IOS 32
> +#define MIN_BIOS (MIN_IOS * 2)
> +#define VERITY_DEFAULT_BLOCK_SIZE 4096
> +
> +/* Provide a lightweight means of specifying the global default for
> + * error behavior: eio, reboot, or none
> + * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
> + * This is matched to the enum in dm-verity.h.
> + */
> +static const char * const allowed_error_behaviors[] = { "eio", "panic", "none",
> +                                                       "notify", NULL };
> +static char *error_behavior = "eio";
> +module_param(error_behavior, charp, 0644);
> +MODULE_PARM_DESC(error_behavior, "Behavior on error "
> +                                "(eio, panic, none, notify)");
> +
> +/* Controls whether verity_get_device will wait forever for a device. */
> +static int dev_wait;
> +module_param(dev_wait, bool, 0444);
> +MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
> +
> +/* per-requested-bio private data */
> +enum verity_io_flags {
> +       VERITY_IOFLAGS_CLONED = 0x1,    /* original bio has been cloned */
> +};
> +
> +struct dm_verity_io {
> +       struct dm_target *target;
> +       struct bio *bio;
> +       struct delayed_work work;
> +       unsigned int flags;
> +
> +       int error;
> +       atomic_t pending;
> +
> +       u64 block;  /* aligned block index */
> +       u64 count;  /* aligned count in blocks */
> +};
> +
> +struct verity_config {
> +       struct dm_dev *dev;
> +       sector_t start;
> +       sector_t size;
> +
> +       struct dm_dev *hash_dev;
> +       sector_t hash_start;
> +
> +       struct dm_bht bht;
> +
> +       /* Pool required for io contexts */
> +       mempool_t *io_pool;
> +       /* Pool and bios required for making sure that backing device reads are
> +        * in PAGE_SIZE increments.
> +        */
> +       struct bio_set *bs;
> +
> +       char hash_alg[CRYPTO_MAX_ALG_NAME];
> +
> +       int error_behavior;
> +};
> +
> +static struct kmem_cache *_verity_io_pool;
> +static struct workqueue_struct *kveritydq, *kverityd_ioq;
> +
> +static void kverityd_verify(struct work_struct *work);
> +static void kverityd_io(struct work_struct *work);
> +static void kverityd_io_bht_populate(struct dm_verity_io *io);
> +static void kverityd_io_bht_populate_end(struct bio *, int error);
> +
> +static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
> +
> +/*
> + * Exported interfaces
> + */
> +
> +int dm_verity_register_error_notifier(struct notifier_block *nb)
> +{
> +       return blocking_notifier_chain_register(&verity_error_notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
> +
> +int dm_verity_unregister_error_notifier(struct notifier_block *nb)
> +{
> +       return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
> +}
> +EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
> +
> +/*
> + * Allocation and utility functions
> + */
> +
> +static void kverityd_src_io_read_end(struct bio *clone, int error);
> +
> +/* Shared destructor for all internal bios */
> +static void dm_verity_bio_destructor(struct bio *bio)
> +{
> +       struct dm_verity_io *io = bio->bi_private;
> +       struct verity_config *vc = io->target->private;
> +       bio_free(bio, vc->bs);
> +}
> +
> +static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask,
> +                                      int nr_iovecs)
> +{
> +       return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs);
> +}
> +
> +static struct dm_verity_io *verity_io_alloc(struct dm_target *ti,
> +                                           struct bio *bio)
> +{
> +       struct verity_config *vc = ti->private;
> +       sector_t sector = bio->bi_sector - ti->begin;
> +       struct dm_verity_io *io;
> +
> +       io = mempool_alloc(vc->io_pool, GFP_NOIO);
> +       if (unlikely(!io))
> +               return NULL;
> +       io->flags = 0;
> +       io->target = ti;
> +       io->bio = bio;
> +       io->error = 0;
> +
> +       /* Adjust the sector by the virtual starting sector */
> +       io->block = to_bytes(sector) / vc->bht.block_size;
> +       io->count = bio->bi_size / vc->bht.block_size;
> +
> +       atomic_set(&io->pending, 0);
> +
> +       return io;
> +}
> +
> +static struct bio *verity_bio_clone(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +       struct bio *bio = io->bio;
> +       struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs);
> +
> +       if (!clone)
> +               return NULL;
> +
> +       __bio_clone(clone, bio);
> +       clone->bi_private = io;
> +       clone->bi_end_io  = kverityd_src_io_read_end;
> +       clone->bi_bdev    = vc->dev->bdev;
> +       clone->bi_sector += vc->start - io->target->begin;
> +       clone->bi_destructor = dm_verity_bio_destructor;
> +
> +       return clone;
> +}
> +
> +/* If the request is not successful, this handler takes action.
> + * TODO make this call a registered handler.
> + */
> +static void verity_error(struct verity_config *vc, struct dm_verity_io *io,
> +                        int error)
> +{
> +       const char *message;
> +       int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
> +       dev_t devt = 0;
> +       u64 block = ~0;
> +       int transient = 1;
> +       struct dm_verity_error_state error_state;
> +
> +       if (vc) {
> +               devt = vc->dev->bdev->bd_dev;
> +               error_mode = vc->error_behavior;
> +       }
> +
> +       if (io) {
> +               io->error = -EIO;
> +               block = io->block;
> +       }
> +
> +       switch (error) {
> +       case -ENOMEM:
> +               message = "out of memory";
> +               break;
> +       case -EBUSY:
> +               message = "pending data seen during verify";
> +               break;
> +       case -EFAULT:
> +               message = "crypto operation failure";
> +               break;
> +       case -EACCES:
> +               message = "integrity failure";
> +               /* Image is bad. */
> +               transient = 0;
> +               break;
> +       case -EPERM:
> +               message = "hash tree population failure";
> +               /* Should be dm-bht specific errors */
> +               transient = 0;
> +               break;
> +       case -EINVAL:
> +               message = "unexpected missing/invalid data";
> +               /* The device was configured incorrectly - fallback. */
> +               transient = 0;
> +               break;
> +       default:
> +               /* Other errors can be passed through as IO errors */
> +               message = "unknown or I/O error";
> +               return;
> +       }
> +
> +       DMERR_LIMIT("verification failure occurred: %s", message);
> +
> +       if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
> +               error_state.code = error;
> +               error_state.transient = transient;
> +               error_state.block = block;
> +               error_state.message = message;
> +               error_state.dev_start = vc->start;
> +               error_state.dev_len = vc->size;
> +               error_state.dev = vc->dev->bdev;
> +               error_state.hash_dev_start = vc->hash_start;
> +               error_state.hash_dev_len = vc->bht.sectors;
> +               error_state.hash_dev = vc->hash_dev->bdev;
> +
> +               /* Set default fallthrough behavior. */
> +               error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
> +               error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC;
> +
> +               if (!blocking_notifier_call_chain(
> +                   &verity_error_notifier, transient, &error_state)) {
> +                       error_mode = error_state.behavior;
> +               }
> +       }
> +
> +       switch (error_mode) {
> +       case DM_VERITY_ERROR_BEHAVIOR_EIO:
> +               break;
> +       case DM_VERITY_ERROR_BEHAVIOR_NONE:
> +               if (error != -EIO && io)
> +                       io->error = 0;
> +               break;
> +       default:
> +               goto do_panic;
> +       }
> +       return;
> +
> +do_panic:
> +       panic("dm-verity failure: "
> +             "device:%u:%u error:%d block:%llu message:%s",
> +             MAJOR(devt), MINOR(devt), error, ULL(block), message);
> +}
> +
> +/**
> + * verity_parse_error_behavior - parse a behavior charp to the enum
> + * @behavior:  NUL-terminated char array
> + *
> + * Checks if the behavior is valid either as text or as an index digit
> + * and returns the proper enum value or -1 on error.
> + */
> +static int verity_parse_error_behavior(const char *behavior)
> +{
> +       const char * const *allowed = allowed_error_behaviors;
> +       char index = '0';
> +
> +       for (; *allowed; allowed++, index++)
> +               if (!strcmp(*allowed, behavior) || behavior[0] == index)
> +                       break;
> +
> +       if (!*allowed)
> +               return -1;
> +
> +       /* Convert to the integer index matching the enum. */
> +       return allowed - allowed_error_behaviors;
> +}
> +
> +/*
> + * Reverse flow of requests into the device.
> + *
> + * (Start at the bottom with verity_map and work your way upward).
> + */
> +
> +static void verity_inc_pending(struct dm_verity_io *io);
> +
> +static void verity_return_bio_to_caller(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +
> +       if (io->error)
> +               verity_error(vc, io, io->error);
> +
> +       bio_endio(io->bio, io->error);
> +       mempool_free(io, vc->io_pool);
> +}
> +
> +/* Check for any missing bht hashes. */
> +static bool verity_is_bht_populated(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +       u64 block;
> +
> +       for (block = io->block; block < io->block + io->count; ++block)
> +               if (!dm_bht_is_populated(&vc->bht, block))
> +                       return false;
> +
> +       return true;
> +}
> +
> +/* verity_dec_pending manages the lifetime of all dm_verity_io structs.
> + * Non-bug error handling is centralized through this interface and
> + * all passage from workqueue to workqueue.
> + */
> +static void verity_dec_pending(struct dm_verity_io *io)
> +{
> +       if (!atomic_dec_and_test(&io->pending))
> +               goto done;
> +
> +       if (unlikely(io->error))
> +               goto io_error;
> +
> +       /* I/Os that were pending may now be ready */
> +       if (verity_is_bht_populated(io)) {
> +               INIT_DELAYED_WORK(&io->work, kverityd_verify);
> +               queue_delayed_work(kveritydq, &io->work, 0);
> +       } else {
> +               INIT_DELAYED_WORK(&io->work, kverityd_io);
> +               queue_delayed_work(kverityd_ioq, &io->work, HZ/10);
> +       }
> +
> +done:
> +       return;
> +
> +io_error:
> +       verity_return_bio_to_caller(io);
> +}
> +
> +/* Walks the data set and computes the hash of the data read from the
> + * untrusted source device.  The computed hash is then passed to dm-bht
> + * for verification.
> + */
> +static int verity_verify(struct verity_config *vc,
> +                        struct dm_verity_io *io)
> +{
> +       unsigned int block_size = vc->bht.block_size;
> +       struct bio *bio = io->bio;
> +       u64 block = io->block;
> +       unsigned int idx;
> +       int r;
> +
> +       for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) {
> +               struct bio_vec *bv = bio_iovec_idx(bio, idx);
> +               unsigned int offset = bv->bv_offset;
> +               unsigned int len = bv->bv_len;
> +
> +               VERITY_BUG_ON(offset % block_size);
> +               VERITY_BUG_ON(len % block_size);
> +
> +               while (len) {
> +                       r = dm_bht_verify_block(&vc->bht, block,
> +                                               bv->bv_page, offset);
> +                       if (r)
> +                               goto bad_return;
> +
> +                       offset += block_size;
> +                       len -= block_size;
> +                       block++;
> +                       cond_resched();
> +               }
> +       }
> +
> +       return 0;
> +
> +bad_return:
> +       /* dm_bht functions aren't expected to return errno friendly
> +        * values.  They are converted here for uniformity.
> +        */
> +       if (r > 0) {
> +               DMERR("Pending data for block %llu seen at verify", ULL(block));
> +               r = -EBUSY;
> +       } else {
> +               DMERR_LIMIT("Block hash does not match!");
> +               r = -EACCES;
> +       }
> +       return r;
> +}
> +
> +/* Services the verify workqueue */
> +static void kverityd_verify(struct work_struct *work)
> +{
> +       struct delayed_work *dwork = container_of(work, struct delayed_work,
> +                                                 work);
> +       struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
> +                                              work);
> +       struct verity_config *vc = io->target->private;
> +
> +       io->error = verity_verify(vc, io);
> +
> +       /* Free up the bio and tag with the return value */
> +       verity_return_bio_to_caller(io);
> +}
> +
> +/* Asynchronously called upon the completion of dm-bht I/O.  The status
> + * of the operation is passed back to dm-bht and the next steps are
> + * decided by verity_dec_pending.
> + */
> +static void kverityd_io_bht_populate_end(struct bio *bio, int error)
> +{
> +       struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private;
> +       struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context;
> +
> +       /* Tell the tree to atomically update now that we've populated
> +        * the given entry.
> +        */
> +       dm_bht_read_completed(entry, error);
> +
> +       /* Clean up for reuse when reading data to be checked */
> +       bio->bi_vcnt = 0;
> +       bio->bi_io_vec->bv_offset = 0;
> +       bio->bi_io_vec->bv_len = 0;
> +       bio->bi_io_vec->bv_page = NULL;
> +       /* Restore the private data to I/O so the destructor can be shared. */
> +       bio->bi_private = (void *) io;
> +       bio_put(bio);
> +
> +       /* We bail but assume the tree has been marked bad. */
> +       if (unlikely(error)) {
> +               DMERR("Failed to read for sector %llu (%u)",
> +                     ULL(io->bio->bi_sector), io->bio->bi_size);
> +               io->error = error;
> +               /* Pass through the error to verity_dec_pending below */
> +       }
> +       /* When pending = 0, it will transition to reading real data */
> +       verity_dec_pending(io);
> +}
> +
> +/* Called by dm-bht (via dm_bht_populate), this function provides
> + * the message digests to dm-bht that are stored on disk.
> + */
> +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst,
> +                                     sector_t count,
> +                                     struct dm_bht_entry *entry)
> +{
> +       struct dm_verity_io *io = ctx;  /* I/O for this batch */
> +       struct verity_config *vc;
> +       struct bio *bio;
> +
> +       vc = io->target->private;
> +
> +       /* The I/O context is nested inside the entry so that we don't need one
> +        * io context per page read.
> +        */
> +       entry->io_context = ctx;
> +
> +       /* We should only get page size requests at present. */
> +       verity_inc_pending(io);
> +       bio = verity_alloc_bioset(vc, GFP_NOIO, 1);
> +       if (unlikely(!bio)) {
> +               DMCRIT("Out of memory at bio_alloc_bioset");
> +               dm_bht_read_completed(entry, -ENOMEM);
> +               return -ENOMEM;
> +       }
> +       bio->bi_private = (void *) entry;
> +       bio->bi_idx = 0;
> +       bio->bi_size = vc->bht.block_size;
> +       bio->bi_sector = vc->hash_start + start;
> +       bio->bi_bdev = vc->hash_dev->bdev;
> +       bio->bi_end_io = kverityd_io_bht_populate_end;
> +       bio->bi_rw = REQ_META;
> +       /* Only need to free the bio since the page is managed by bht */
> +       bio->bi_destructor = dm_verity_bio_destructor;
> +       bio->bi_vcnt = 1;
> +       bio->bi_io_vec->bv_offset = offset_in_page(dst);
> +       bio->bi_io_vec->bv_len = to_bytes(count);
> +       /* dst is guaranteed to be a page_pool allocation */
> +       bio->bi_io_vec->bv_page = virt_to_page(dst);
> +       /* Track that this I/O is in use.  There should be no risk of the io
> +        * being removed prior since this is called synchronously.
> +        */
> +       generic_make_request(bio);
> +       return 0;
> +}
> +
> +/* Submits an io request for each missing block of block hashes.
> + * The last one to return will then enqueue this on the io workqueue.
> + */
> +static void kverityd_io_bht_populate(struct dm_verity_io *io)
> +{
> +       struct verity_config *vc = io->target->private;
> +       u64 block;
> +
> +       for (block = io->block; block < io->block + io->count; ++block) {
> +               int ret = dm_bht_populate(&vc->bht, io, block);
> +
> +               if (ret < 0) {
> +                       /* verity_dec_pending will handle the error case. */
> +                       io->error = ret;
> +                       break;
> +               }
> +       }
> +}
> +
> +/* Asynchronously called upon the completion of I/O issued
> + * from kverityd_src_io_read. verity_dec_pending() acts as
> + * the scheduler/flow manager.
> + */
> +static void kverityd_src_io_read_end(struct bio *clone, int error)
> +{
> +       struct dm_verity_io *io = clone->bi_private;
> +
> +       if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
> +               error = -EIO;
> +
> +       if (unlikely(error)) {
> +               DMERR("Error occurred: %d (%llu, %u)",
> +                       error, ULL(clone->bi_sector), clone->bi_size);
> +               io->error = error;
> +       }
> +
> +       /* Release the clone which just avoids the block layer from
> +        * leaving offsets, etc in unexpected states.
> +        */
> +       bio_put(clone);
> +
> +       verity_dec_pending(io);
> +}
> +
> +/* If not yet underway, an I/O request will be issued to the vc->dev
> + * device for the data needed. It is cloned to avoid unexpected changes
> + * to the original bio struct.
> + */
> +static void kverityd_src_io_read(struct dm_verity_io *io)
> +{
> +       struct bio *clone;
> +
> +       /* Check if the read is already issued. */
> +       if (io->flags & VERITY_IOFLAGS_CLONED)
> +               return;
> +
> +       io->flags |= VERITY_IOFLAGS_CLONED;
> +
> +       /* Clone the bio. The block layer may modify the bvec array. */
> +       clone = verity_bio_clone(io);
> +       if (unlikely(!clone)) {
> +               io->error = -ENOMEM;
> +               return;
> +       }
> +
> +       verity_inc_pending(io);
> +
> +       generic_make_request(clone);
> +}
> +
> +/* kverityd_io services the I/O workqueue. For each pass through
> + * the I/O workqueue, a call to populate both the origin drive
> + * data and the hash tree data is made.
> + */
> +static void kverityd_io(struct work_struct *work)
> +{
> +       struct delayed_work *dwork = container_of(work, struct delayed_work,
> +                                                 work);
> +       struct dm_verity_io *io = container_of(dwork, struct dm_verity_io,
> +                                              work);
> +
> +       /* Issue requests asynchronously. */
> +       verity_inc_pending(io);
> +       kverityd_src_io_read(io);
> +       kverityd_io_bht_populate(io);
> +       verity_dec_pending(io);
> +}
> +
> +/* Paired with verity_dec_pending, the pending value in the io dictate the
> + * lifetime of a request and when it is ready to be processed on the
> + * workqueues.
> + */
> +static void verity_inc_pending(struct dm_verity_io *io)
> +{
> +       atomic_inc(&io->pending);
> +}
> +
> +/* Block-level requests start here. */
> +static int verity_map(struct dm_target *ti, struct bio *bio,
> +                     union map_info *map_context)
> +{
> +       struct dm_verity_io *io;
> +       struct verity_config *vc;
> +       struct request_queue *r_queue;
> +
> +       if (unlikely(!ti)) {
> +               DMERR("dm_target was NULL");
> +               return -EIO;
> +       }
> +
> +       vc = ti->private;
> +       r_queue = bdev_get_queue(vc->dev->bdev);
> +
> +       if (bio_data_dir(bio) == WRITE) {
> +               /* If we silently drop writes, then the VFS layer will cache
> +                * the write and persist it in memory. While it doesn't change
> +                * the underlying storage, it still may be contrary to the
> +                * behavior expected by a verified, read-only device.
> +                */
> +               DMWARN_LIMIT("write request received. rejecting with -EIO.");
> +               verity_error(vc, NULL, -EIO);
> +               return -EIO;
> +       } else {
> +               /* Queue up the request to be verified */
> +               io = verity_io_alloc(ti, bio);
> +               if (!io) {
> +                       DMERR_LIMIT("Failed to allocate and init IO data");
> +                       return DM_MAPIO_REQUEUE;
> +               }
> +               INIT_DELAYED_WORK(&io->work, kverityd_io);
> +               queue_delayed_work(kverityd_ioq, &io->work, 0);
> +       }
> +
> +       return DM_MAPIO_SUBMITTED;
> +}
> +
> +static void splitarg(char *arg, char **key, char **val)
> +{
> +       *key = strsep(&arg, "=");
> +       *val = strsep(&arg, "");
> +}
> +
> +/*
> + * Non-block interfaces and device-mapper specific code
> + */
> +
> +/**
> + * verity_ctr - Construct a verified mapping
> + * @ti:   Target being created
> + * @argc: Number of elements in argv
> + * @argv: Vector of key-value pairs (see below).
> + *
> + * Accepts the following keys:
> + * @payload:        hashed device
> + * @hashtree:       device hashtree is stored on
> + * @hashstart:      start address of hashes (default 0)
> + * @block_size:     size of a hash block
> + * @alg:            hash algorithm
> + * @root_hexdigest: toplevel hash of the tree
> + * @error_behavior: what to do when verification fails [optional]
> + * @salt:           salt, in hex [optional]
> + *
> + * E.g.,
> + * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256
> + * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e
> + *
> + * TODO(wad):
> + * - Boot time addition
> + * - Track block verification to free block_hashes if memory use is a concern
> + * Testing needed:
> + * - Regular slub_debug tracing (on checkins)
> + * - Improper block hash padding
> + * - Improper bundle padding
> + * - Improper hash layout
> + * - Missing padding at end of device
> + * - Improperly sized underlying devices
> + * - Out of memory conditions (make sure this isn't too flaky under high load!)
> + * - Incorrect superhash
> + * - Incorrect block hashes
> + * - Incorrect bundle hashes
> + * - Boot-up read speed; sustained read speeds
> + */
> +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> +{
> +       struct verity_config *vc = NULL;
> +       int ret = 0;
> +       sector_t blocks;
> +       unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE;
> +       const char *payload = NULL;
> +       const char *hashtree = NULL;
> +       unsigned long hashstart = 0;
> +       const char *alg = NULL;
> +       const char *root_hexdigest = NULL;
> +       const char *dev_error_behavior = error_behavior;
> +       const char *hexsalt = "";
> +       int i;
> +
> +       for (i = 0; i < argc; ++i) {
> +               char *key, *val;
> +               DMWARN("Argument %d: '%s'", i, argv[i]);
> +               splitarg(argv[i], &key, &val);
> +               if (!key) {
> +                       DMWARN("Bad argument %d: missing key?", i);
> +                       break;
> +               }
> +               if (!val) {
> +                       DMWARN("Bad argument %d='%s': missing value", i, key);
> +                       break;
> +               }
> +
> +               if (!strcmp(key, "alg")) {
> +                       alg = val;
> +               } else if (!strcmp(key, "payload")) {
> +                       payload = val;
> +               } else if (!strcmp(key, "hashtree")) {
> +                       hashtree = val;
> +               } else if (!strcmp(key, "root_hexdigest")) {
> +                       root_hexdigest = val;
> +               } else if (!strcmp(key, "hashstart")) {
> +                       if (strict_strtoul(val, 10, &hashstart)) {
> +                               ti->error = "Invalid hashstart";
> +                               return -EINVAL;
> +                       }
> +               } else if (!strcmp(key, "block_size")) {
> +                       unsigned long tmp;
> +                       if (strict_strtoul(val, 10, &tmp) ||
> +                           (tmp > UINT_MAX)) {
> +                               ti->error = "Invalid block_size";
> +                               return -EINVAL;
> +                       }
> +                       block_size = (unsigned int)tmp;
> +               } else if (!strcmp(key, "error_behavior")) {
> +                       dev_error_behavior = val;
> +               } else if (!strcmp(key, "salt")) {
> +                       hexsalt = val;
> +               } else if (!strcmp(key, "error_behavior")) {
> +                       dev_error_behavior = val;
> +               }
> +       }
> +
> +#define NEEDARG(n) \
> +       if (!(n)) { \
> +               ti->error = "Missing argument: " #n; \
> +               return -EINVAL; \
> +       }
> +
> +       NEEDARG(alg);
> +       NEEDARG(payload);
> +       NEEDARG(hashtree);
> +       NEEDARG(root_hexdigest);
> +
> +#undef NEEDARG
> +
> +       /* The device mapper device should be setup read-only */
> +       if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
> +               ti->error = "Must be created readonly.";
> +               return -EINVAL;
> +       }
> +
> +       vc = kzalloc(sizeof(*vc), GFP_KERNEL);
> +       if (!vc) {
> +               /* TODO(wad) if this is called from the setup helper, then we
> +                * catch these errors and do a CrOS specific thing. if not, we
> +                * need to have this call the error handler.
> +                */
> +               return -EINVAL;
> +       }
> +
> +       /* Calculate the blocks from the given device size */
> +       vc->size = ti->len;
> +       blocks = to_bytes(vc->size) / block_size;
> +       if (dm_bht_create(&vc->bht, blocks, block_size, alg)) {
> +               DMERR("failed to create required bht");
> +               goto bad_bht;
> +       }
> +       if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) {
> +               DMERR("root hexdigest error");
> +               goto bad_root_hexdigest;
> +       }
> +       dm_bht_set_salt(&vc->bht, hexsalt);
> +       vc->bht.read_cb = kverityd_bht_read_callback;
> +
> +       /* payload: device to verify */
> +       vc->start = 0;  /* TODO: should this support a starting offset? */
> +       /* We only ever grab the device in read-only mode. */
> +       ret = dm_get_device(ti, payload,
> +                           dm_table_get_mode(ti->table), &vc->dev);
> +       if (ret) {
> +               DMERR("Failed to acquire device '%s': %d", payload, ret);
> +               ti->error = "Device lookup failed";
> +               goto bad_verity_dev;
> +       }
> +
> +       if ((to_bytes(vc->start) % block_size) ||
> +           (to_bytes(vc->size) % block_size)) {
> +               ti->error = "Device must be block_size divisble/aligned";
> +               goto bad_hash_start;
> +       }
> +
> +       vc->hash_start = (sector_t)hashstart;
> +
> +       /* hashtree: device with hashes.
> +        * Note, payload == hashtree is okay as long as the size of
> +        *       ti->len passed to device mapper does not include
> +        *       the hashes.
> +        */
> +       if (dm_get_device(ti, hashtree,
> +                         dm_table_get_mode(ti->table), &vc->hash_dev)) {
> +               ti->error = "Hash device lookup failed";
> +               goto bad_hash_dev;
> +       }
> +
> +       /* arg4: cryptographic digest algorithm */
> +       if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >=
> +           CRYPTO_MAX_ALG_NAME) {
> +               ti->error = "Hash algorithm name is too long";
> +               goto bad_hash;
> +       }
> +
> +       /* override with optional device-specific error behavior */
> +       vc->error_behavior = verity_parse_error_behavior(dev_error_behavior);
> +       if (vc->error_behavior == -1) {
> +               ti->error = "Bad error_behavior supplied";
> +               goto bad_err_behavior;
> +       }
> +
> +       /* TODO: Maybe issues a request on the io queue for block 0? */
> +
> +       /* Argument processing is done, setup operational data */
> +       /* Pool for dm_verity_io objects */
> +       vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool);
> +       if (!vc->io_pool) {
> +               ti->error = "Cannot allocate verity io mempool";
> +               goto bad_slab_pool;
> +       }
> +
> +       /* Allocate the bioset used for request padding */
> +       /* TODO(wad) allocate a separate bioset for the first verify maybe */
> +       vc->bs = bioset_create(MIN_BIOS, 0);
> +       if (!vc->bs) {
> +               ti->error = "Cannot allocate verity bioset";
> +               goto bad_bs;
> +       }
> +
> +       ti->num_flush_requests = 1;
> +       ti->private = vc;
> +
> +       /* TODO(wad) add device and hash device names */
> +       {
> +               char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
> +               bdevname(vc->hash_dev->bdev, hashdev);
> +               bdevname(vc->dev->bdev, vdev);
> +               DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev,
> +                      hashdev, ULL(vc->bht.sectors), ULL(blocks));
> +       }
> +       return 0;
> +
> +bad_bs:
> +       mempool_destroy(vc->io_pool);
> +bad_slab_pool:
> +bad_err_behavior:
> +bad_hash:
> +       dm_put_device(ti, vc->hash_dev);
> +bad_hash_dev:
> +bad_hash_start:
> +       dm_put_device(ti, vc->dev);
> +bad_bht:
> +bad_root_hexdigest:
> +bad_verity_dev:
> +       kfree(vc);   /* hash is not secret so no need to zero */
> +       return -EINVAL;
> +}
> +
> +static void verity_dtr(struct dm_target *ti)
> +{
> +       struct verity_config *vc = (struct verity_config *) ti->private;
> +
> +       bioset_free(vc->bs);
> +       mempool_destroy(vc->io_pool);
> +       dm_bht_destroy(&vc->bht);
> +       dm_put_device(ti, vc->hash_dev);
> +       dm_put_device(ti, vc->dev);
> +       kfree(vc);
> +}
> +
> +static int verity_status(struct dm_target *ti, status_type_t type,
> +                       char *result, unsigned int maxlen)
> +{
> +       struct verity_config *vc = (struct verity_config *) ti->private;
> +       unsigned int sz = 0;
> +       char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE];
> +       u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 };
> +
> +       dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest));
> +
> +       switch (type) {
> +       case STATUSTYPE_INFO:
> +               break;
> +       case STATUSTYPE_TABLE:
> +               bdevname(vc->hash_dev->bdev, hashdev);
> +               bdevname(vc->dev->bdev, vdev);
> +               DMEMIT("/dev/%s /dev/%s %llu %u %s %s",
> +                       vdev,
> +                       hashdev,
> +                       ULL(vc->hash_start),
> +                       vc->bht.depth,
> +                       vc->hash_alg,
> +                       hexdigest);
> +               break;
> +       }
> +       return 0;
> +}
> +
> +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> +                      struct bio_vec *biovec, int max_size)
> +{
> +       struct verity_config *vc = ti->private;
> +       struct request_queue *q = bdev_get_queue(vc->dev->bdev);
> +
> +       if (!q->merge_bvec_fn)
> +               return max_size;
> +
> +       bvm->bi_bdev = vc->dev->bdev;
> +       bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin;
> +
> +       /* Optionally, this could just return 0 to stick to single pages. */
> +       return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> +}
> +
> +static int verity_iterate_devices(struct dm_target *ti,
> +                                iterate_devices_callout_fn fn, void *data)
> +{
> +       struct verity_config *vc = ti->private;
> +
> +       return fn(ti, vc->dev, vc->start, ti->len, data);
> +}
> +
> +static void verity_io_hints(struct dm_target *ti,
> +                           struct queue_limits *limits)
> +{
> +       struct verity_config *vc = ti->private;
> +       unsigned int block_size = vc->bht.block_size;
> +
> +       limits->logical_block_size = block_size;
> +       limits->physical_block_size = block_size;
> +       blk_limits_io_min(limits, block_size);
> +}
> +
> +static struct target_type verity_target = {
> +       .name   = "verity",
> +       .version = {0, 1, 0},
> +       .module = THIS_MODULE,
> +       .ctr    = verity_ctr,
> +       .dtr    = verity_dtr,
> +       .map    = verity_map,
> +       .merge  = verity_merge,
> +       .status = verity_status,
> +       .iterate_devices = verity_iterate_devices,
> +       .io_hints = verity_io_hints,
> +};
> +
> +#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI)
> +
> +static int __init dm_verity_init(void)
> +{
> +       int r = -ENOMEM;
> +
> +       _verity_io_pool = KMEM_CACHE(dm_verity_io, 0);
> +       if (!_verity_io_pool) {
> +               DMERR("failed to allocate pool dm_verity_io");
> +               goto bad_io_pool;
> +       }
> +
> +       kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1);
> +       if (!kverityd_ioq) {
> +               DMERR("failed to create workqueue kverityd_ioq");
> +               goto bad_io_queue;
> +       }
> +
> +       kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1);
> +       if (!kveritydq) {
> +               DMERR("failed to create workqueue kveritydq");
> +               goto bad_verify_queue;
> +       }
> +
> +       r = dm_register_target(&verity_target);
> +       if (r < 0) {
> +               DMERR("register failed %d", r);
> +               goto register_failed;
> +       }
> +
> +       DMINFO("version %u.%u.%u loaded", verity_target.version[0],
> +              verity_target.version[1], verity_target.version[2]);
> +
> +       return r;
> +
> +register_failed:
> +       destroy_workqueue(kveritydq);
> +bad_verify_queue:
> +       destroy_workqueue(kverityd_ioq);
> +bad_io_queue:
> +       kmem_cache_destroy(_verity_io_pool);
> +bad_io_pool:
> +       return r;
> +}
> +
> +static void __exit dm_verity_exit(void)
> +{
> +       destroy_workqueue(kveritydq);
> +       destroy_workqueue(kverityd_ioq);
> +
> +       dm_unregister_target(&verity_target);
> +       kmem_cache_destroy(_verity_io_pool);
> +}
> +
> +module_init(dm_verity_init);
> +module_exit(dm_verity_exit);
> +
> +MODULE_AUTHOR("The Chromium OS Authors <chromium-os-dev chromium org>");
> +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
> new file mode 100644
> index 0000000..e0664c9
> --- /dev/null
> +++ b/drivers/md/dm-verity.h
> @@ -0,0 +1,45 @@
> +/*
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev chromium org>
> + *                    All Rights Reserved.
> + *
> + * This file is released under the GPLv2.
> + *
> + * Provide error types for use when creating a custom error handler.
> + * See Documentation/device-mapper/dm-verity.txt
> + */
> +#ifndef DM_VERITY_H
> +#define DM_VERITY_H
> +
> +#include <linux/notifier.h>
> +
> +struct dm_verity_error_state {
> +       int code;
> +       int transient;  /* Likely to not happen after a reboot */
> +       u64 block;
> +       const char *message;
> +
> +       sector_t dev_start;
> +       sector_t dev_len;
> +       struct block_device *dev;
> +
> +       sector_t hash_dev_start;
> +       sector_t hash_dev_len;
> +       struct block_device *hash_dev;
> +
> +       /* Final behavior after all notifications are completed. */
> +       int behavior;
> +};
> +
> +/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
> +enum dm_verity_error_behavior {
> +       DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
> +       DM_VERITY_ERROR_BEHAVIOR_PANIC,
> +       DM_VERITY_ERROR_BEHAVIOR_NONE,
> +       DM_VERITY_ERROR_BEHAVIOR_NOTIFY
> +};
> +
> +
> +int dm_verity_register_error_notifier(struct notifier_block *nb);
> +int dm_verity_unregister_error_notifier(struct notifier_block *nb);
> +
> +#endif  /* DM_VERITY_H */
> diff --git a/include/linux/dm-bht.h b/include/linux/dm-bht.h
> new file mode 100644
> index 0000000..0595911
> --- /dev/null
> +++ b/include/linux/dm-bht.h
> @@ -0,0 +1,166 @@
> +/*
> + * Copyright (C) 2011 The Chromium OS Authors <chromium-os-dev chromium org>
> + *
> + * Device-Mapper block hash tree interface.
> + * See Documentation/device-mapper/dm-bht.txt for details.
> + *
> + * This file is released under the GPLv2.
> + */
> +#ifndef __LINUX_DM_BHT_H
> +#define __LINUX_DM_BHT_H
> +
> +#include <linux/compiler.h>
> +#include <linux/crypto.h>
> +#include <linux/types.h>
> +
> +/* To avoid allocating memory for digest tests, we just setup a
> + * max to use for now.
> + */
> +#define DM_BHT_MAX_DIGEST_SIZE 128  /* 1k hashes are unlikely for now */
> +#define DM_BHT_SALT_SIZE       32   /* 256 bits of salt is a lot */
> +
> +/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other
> + * values are entry-related return codes.
> + */
> +#define DM_BHT_ENTRY_VERIFIED 8  /* 'nodes' has been checked against parent */
> +#define DM_BHT_ENTRY_READY 4  /* 'nodes' is loaded and available */
> +#define DM_BHT_ENTRY_PENDING 2  /* 'nodes' is being loaded */
> +#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */
> +#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */
> +#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */
> +
> +/* Additional possible return codes */
> +#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */
> +
> +/* dm_bht_entry
> + * Contains dm_bht->node_count tree nodes at a given tree depth.
> + * state is used to transactionally assure that data is paged in
> + * from disk.  Unless dm_bht kept running crypto contexts for each
> + * level, we need to load in the data for on-demand verification.
> + */
> +struct dm_bht_entry {
> +       atomic_t state; /* see defines */
> +       /* Keeping an extra pointer per entry wastes up to ~33k of
> +        * memory if a 1m blocks are used (or 66 on 64-bit arch)
> +        */
> +       void *io_context;  /* Reserve a pointer for use during io */
> +       /* data should only be non-NULL if fully populated. */
> +       void *nodes;  /* The hash data used to verify the children.
> +                      * Guaranteed to be page-aligned.
> +                      */
> +};
> +
> +/* dm_bht_level
> + * Contains an array of entries which represent a page of hashes where
> + * each hash is a node in the tree at the given tree depth/level.
> + */
> +struct dm_bht_level {
> +       struct dm_bht_entry *entries;  /* array of entries of tree nodes */
> +       unsigned int count;  /* number of entries at this level */
> +       sector_t sector;  /* starting sector for this level */
> +};
> +
> +/* opaque context, start, databuf, sector_count */
> +typedef int(*dm_bht_callback)(void *,  /* external context */
> +                             sector_t,  /* start sector */
> +                             u8 *,  /* destination page */
> +                             sector_t,  /* num sectors */
> +                             struct dm_bht_entry *);
> +/* dm_bht - Device mapper block hash tree
> + * dm_bht provides a fixed interface for comparing data blocks
> + * against a cryptographic hashes stored in a hash tree. It
> + * optimizes the tree structure for storage on disk.
> + *
> + * The tree is built from the bottom up.  A collection of data,
> + * external to the tree, is hashed and these hashes are stored
> + * as the blocks in the tree.  For some number of these hashes,
> + * a parent node is created by hashing them.  These steps are
> + * repeated.
> + *
> + * TODO(wad): All hash storage memory is pre-allocated and freed once an
> + * entire branch has been verified.
> + */
> +struct dm_bht {
> +       /* Configured values */
> +       int depth;  /* Depth of the tree including the root */
> +       unsigned int block_count;  /* Number of blocks hashed */
> +       unsigned int block_size;  /* Size of a hash block */
> +       char hash_alg[CRYPTO_MAX_ALG_NAME];
> +       unsigned char salt[DM_BHT_SALT_SIZE];
> +
> +       /* Computed values */
> +       unsigned int node_count;  /* Data size (in hashes) for each entry */
> +       unsigned int node_count_shift;  /* first bit set - 1 */
> +       /* There is one per CPU so that verified can be simultaneous. */
> +       struct hash_desc hash_desc[NR_CPUS];  /* Container for the hash alg */
> +       unsigned int digest_size;
> +       sector_t sectors;  /* Number of disk sectors used */
> +
> +       /* bool verified;  Full tree is verified */
> +       u8 root_digest[DM_BHT_MAX_DIGEST_SIZE];
> +       struct dm_bht_level *levels;  /* in reverse order */
> +       /* Callback for reading from the hash device */
> +       dm_bht_callback read_cb;
> +};
> +
> +/* Constructor for struct dm_bht instances. */
> +int dm_bht_create(struct dm_bht *bht,
> +                 unsigned int block_count,
> +                 unsigned int block_size,
> +                 const char *alg_name);
> +/* Destructor for struct dm_bht instances.  Does not free @bht */
> +void dm_bht_destroy(struct dm_bht *bht);
> +
> +/* Basic accessors for struct dm_bht */
> +int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest);
> +int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available);
> +void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt);
> +int dm_bht_salt(struct dm_bht *bht, char *hexsalt);
> +
> +/* Functions for loading in data from disk for verification */
> +bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block);
> +int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx,
> +                   unsigned int block);
> +int dm_bht_verify_block(struct dm_bht *bht, unsigned int block,
> +                       struct page *pg, unsigned int offset);
> +void dm_bht_read_completed(struct dm_bht_entry *entry, int status);
> +
> +/* Functions for converting indices to nodes. */
> +
> +static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht,
> +                                                 int depth)
> +{
> +       return (bht->depth - depth) * bht->node_count_shift;
> +}
> +
> +/* For the given depth, this is the entry index.  At depth+1 it is the node
> + * index for depth.
> + */
> +static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht,
> +                                                       int depth,
> +                                                       unsigned int leaf)
> +{
> +       return leaf >> dm_bht_get_level_shift(bht, depth);
> +}
> +
> +static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht,
> +                                                   int depth,
> +                                                   unsigned int block)
> +{
> +       unsigned int index = dm_bht_index_at_level(bht, depth, block);
> +       struct dm_bht_level *level = &bht->levels[depth];
> +
> +       return &level->entries[index];
> +}
> +
> +static inline void *dm_bht_get_node(struct dm_bht *bht,
> +                                 struct dm_bht_entry *entry,
> +                                 int depth,
> +                                 unsigned int block)
> +{
> +       unsigned int index = dm_bht_index_at_level(bht, depth, block);
> +       unsigned int node_index = index % bht->node_count;
> +
> +       return entry->nodes + (node_index * bht->digest_size);
> +}
> +#endif  /* __LINUX_DM_BHT_H */
> --
> 1.7.3.1
>
>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]