[dm-devel] snapshot_ctr kcopy memory allocation problem and following kernel madness

Christophe Saout christophe at saout.de
Sun Jan 18 15:19:01 UTC 2004


Hi,

I think I finally found out what killed my webserver the last time.
I've installed a watchdog and now I have something in my log:

Jan 18 20:40:24 websrv lvcreate: page allocation failure. order:0, mode:0xd0
Jan 18 20:40:24 websrv Call Trace:
Jan 18 20:40:24 websrv [<c014082e>] __alloc_pages+0x2ee/0x350
Jan 18 20:40:24 websrv [<c02d5261>] client_alloc_pages+0x31/0x80
Jan 18 20:40:24 websrv [<c02d5c85>] kcopyd_client_create+0x55/0xb0
Jan 18 20:40:24 websrv [<c02d82a8>] dm_create_persistent+0xb8/0x130
Jan 18 20:40:24 websrv [<c02d691b>] snapshot_ctr+0x29b/0x380
Jan 18 20:40:24 websrv [<c02d10b6>] dm_table_add_target+0x116/0x190
Jan 18 20:40:24 websrv [<c02d3870>] populate_table+0x80/0xe0
Jan 18 20:40:24 websrv [<c02d392e>] table_load+0x5e/0x130
Jan 18 20:40:24 websrv [<c02d4274>] ctl_ioctl+0xe4/0x170
Jan 18 20:40:24 websrv [<c02d38d0>] table_load+0x0/0x130
Jan 18 20:40:24 websrv [<c016afc4>] sys_ioctl+0xf4/0x2b0
Jan 18 20:40:24 websrv [<c010b31d>] sysenter_past_esp+0x52/0x71
Jan 18 20:40:24 websrv
Jan 18 20:40:24 websrv device-mapper: Could not create kcopyd client
Jan 18 20:40:24 websrv device-mapper: error adding target to table
Jan 18 20:40:24 websrv found reiserfs format "3.6" with standard journal
Jan 18 20:40:25 websrv Reiserfs journal params: device dm-9, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
Jan 18 20:40:25 websrv reiserfs: checking transaction log (dm-9) for (dm-9)
Jan 18 20:40:27 websrv reiserfs: replayed 31 transactions in 2 seconds
Jan 18 20:40:27 websrv Using r5 hash to sort names
Jan 18 20:40:27 websrv EXT3-fs: INFO: recovery required on readonly filesystem.
Jan 18 20:40:27 websrv EXT3-fs: write access will be enabled during recovery.
Jan 18 20:40:28 websrv kjournald starting.  Commit interval 5 seconds
Jan 18 20:40:28 websrv EXT3-fs: recovery complete.
Jan 18 20:40:28 websrv EXT3-fs: mounted filesystem with ordered data mode.
Jan 18 20:41:48 websrv watchdog[9580]: loadavg 25 7 2 is higher than the given threshold 24 18 12!
Jan 18 20:41:48 websrv SOFTDOG: WDT device closed unexpectedly.  WDT will not stop!
Jan 18 20:41:48 websrv watchdog[9580]: shutting down the system because of error -3
Jan 18 20:41:50 websrv sshd[11779]: Accepted password for achim from ::ffff:213.23.24.241 port 4290
Jan 18 20:41:58 websrv serio: kseriod exiting
Jan 18 20:41:58 websrv syslog-ng[7200]: syslog-ng version 1.6.0rc3 going down

So, kcopyd fails to allocate buffers (lvcreate -s ...)  and then the
machine load goes up and kills the machine.

The backup script tries to remove all snapshots when something fails.
The script logged this:

>>>>>>>> So Jan 18 20:40:20 CET 2004
-------------------------------
  Logical volume "snap-root" created
  Logical volume "snap-boot" created
  device-mapper ioctl cmd 9 failed: Nicht genügend Hauptspeicher verfügbar (ENOMEM)
  Couldn't load device 'vg-snap--home'.
  Problem reactivating origin home
  device-mapper ioctl cmd 6 failed: Das Argument ist ungültig (EINVAL)
  Couldn't resume device 'vg-snap--home'
  Aborting. Failed to activate snapshot exception store. Remove new LV and retry.
  Logical volume "snap-var" created
mount: Gerätedatei /dev/vg/snap-home existiert nicht (EEXIST)
  Logical volume "snap-boot" successfully removed
  Logical volume "snap-root" successfully removed
  Logical volume "snap-portage" successfully removed
  Logical volume "snap-var" successfully removed


Probably last time the script failed to reactivate the root volume and
that's why my log was empty and everything was dead.

I'm going to look into this later but you might want to know.

I've also thought a lot about the snapshot/origin map functions and the
read path. I can't think of a case where doing this is wrong either.

The filesystems will never try to read blocks that are currently being
flushed. Most elevators will put reads in front of writes anyway and
since dm is also something like a in io scheduler we are allowed to do
this. A lot of stress testing with different filesystems (creating and
removing snapshots while doing heavy I/O on the) didn't show any
problems.

Barriers will have to be handled separately anyway, something like
deferring all incoming bios when a barrier is encountered while there
are pending exceptions or something.




More information about the dm-devel mailing list