[dm-devel] Re: [PATCH] drop mutex in __set_size

Jun'ichi Nomura j-nomura at ce.jp.nec.com
Thu Jul 30 23:49:18 UTC 2009


Hi Mikulas,

Mikulas Patocka wrote:
> On Wed, 29 Jul 2009, Jun'ichi Nomura wrote:
>> Mikulas Patocka wrote:
>>> Drop the mutex.
>>>
>>> It doesn't make sense to lock it for a single assignment, this code can't
>>> be executed concurrently. The size should be read with i_size_read which
>>> is automatically protected against concurrent i_size_write.
>> But it seems there are codes which depend on i_mutex for
>> protected access to i_size: e.g. block_write_begin().
> 
> You are right, but I don't know what to do with it. Catch such uses and 
> convert them to i_size_read?

Or move the i_size_write outside of dm suspended state?
Not deeply examined, but I don't think it doesn't have
to be in the suspended code path.

>> And while my description only mentioned noflush suspend,
>> with 2.6.31-rc, the deadlock can occur under normal suspend
>> because the new barrier code may push bios to deferred queue.
> 
> It can deadlock with any code that takes i_mutex and submits or waits for 
> i/os, regardless of barriers.

Yes, you are right.

I attached a reproducer script for easy testing.

-- 
Jun'ichi Nomura, NEC Corporation

#!/bin/sh -x

# --------------------------------------------------------------------
#
# Reproducer for i_mutex deadlock
#
# Usage:
#   1. Edit variables in the config section (especially DEV and MAP)
#   2. Run this script and wait
#   3. The script should stall with the following output:
#        + dmsetup resume ${MAP}
#
# What this scripts do:
#   2 proceeses do the test.
#   One creates a dm dev and repeat resizing it.
#   The other writes to the dm dev and fsync, repeatedly.
#   If the dm resume is performed during the fsync() holding i_mutex,
#   the processes deadlock.
#
# --------------------------------------------------------------------

# --------------------------------------------------------------------
# Config section

# Device to use for testing. Contents will be destroyed.
DEV=/dev/sdX
# dm table name for testing. ${MAP} is overlaid on ${DEV}
MAP=map1
# SIZE1 and SIZE2 are the sizes (in sectors) of the dm dev ${MAP}
SIZE1=1000000
SIZE2=2000000
# Options for 'dmsetup suspend'
SUSPEND_OPTS="--nolockfs --noflush"

# Name of generated test program. Files ${TP}.c and ${TP} are generated.
TP=./fsync-io
# Total I/O size (in MB) that ${TP} should perform before fsync().
IOSIZE=400

## You don't need to edit codes below.
## But you might want to change the length of sleeps and/or
## print_table function for other target type.

# --------------------------------------------------------------------
# Create a test dev
#

function print_table() {
  local sz=$1
#  echo "0 ${sz} multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 ${DEV} 2"
  echo "0 ${sz} linear ${DEV} 0"
}

print_table ${SIZE1} | dmsetup create ${MAP} || exit 1

# --------------------------------------------------------------------
# Repeat write+fsync on dm dev
#

lineno=$(cat $0 | sed -n '/^==BEGIN C CODE==/=')
cat $0 | sed -n "$((lineno + 1)),\$p" > ${TP}.c
gcc -o ${TP} ${TP}.c || exit 1

# Wait a while for the resizing routine to come up and start testing
(while [ -b /dev/mapper/${MAP} ] && ${TP} /dev/mapper/${MAP} ${IOSIZE}; do :; done) &

# --------------------------------------------------------------------
# Repeat resizing dm dev
#

while sleep 1; do
  for sz in $SIZE1 $SIZE2; do
    print_table ${sz} | dmsetup load ${MAP} \
	&& dmsetup suspend ${SUSPEND_OPTS} ${MAP} \
	&& sleep 1 && dmsetup resume ${MAP}
  done
done

exit

# --------------------------------------------------------------------
#
#
==BEGIN C CODE========================================================
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

char buf[1024*1024];

int
main(int argc, char** argv)
{
	int fd, i, size, cnt = 0;
	char *fname;

	if (argc < 3) {
		printf("Usage: <this prog> <dev> <size in MB>\n");
		exit(1);
	}

	fname = argv[1];
	size = atoi(argv[2]);

	fd = open(fname, O_RDWR);
	if (fd < 0) {
		perror("open");
		exit(1);
	}

	for (i = 0; i < size; i++)
		cnt += write(fd, buf, sizeof(buf));

	printf("%d Mbytes written\n", cnt >> 20);

	fsync(fd);
	printf("fsync done.\n");

	close(fd);
	return 0;
}




More information about the dm-devel mailing list