[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[linux-lvm] probably just another xfs and lvm snapshot problem?



Hi list readers,

the archive has various messages about this topic but I am not sure if my 
problem is caused by xfs and the lvm snapshot or not. I came a long way 
to this place. Let me elaborate on this a little.

First of all my system config:
I am using Gentoo Linux, kernel 2.6.16-gentoo-r7, device-mapper-1.02.07, 
lvm2-2.02.06, xfsprogs-2.7.11

and on the test box: 2.6.17-gentoo-r4, device-mapper-1.02.08, 
lvm2-2.02.07, xfsprogs-2.7.11

My intension is to make full system (just everything) backups of a 
reasonably busy mailserver. I am using cyrus imap as pop/imap server, 
postfix as smtpd and amavisd-new as malware scanner. Pretty common setup.

I wrote a backup script that uses the hard linking techique of rsync 
(described e.g. here 
http://www.mikerubel.org/computers/rsync_snapshots/). Before starting 
rsync I made lvm snapshots of /var and /var/spool/imap. These are the 
most busiest partitions.

The output of rsync showed errors and everybody wants his backup to be the 
same as the original data, right? Thus I wrote another script that 
verifies source against destination data. I can provide the code for it. 
The script fails when doing the md5sum check. Here is the output:

md5sum: WARNING: 8 of 70769 computed checksums did NOT match
md5sum check failed
./amavis/db/__db.001: FAILED
./amavis/db/__db.002: FAILED
./amavis/db/__db.003: FAILED
./imap/db/__db.001: FAILED
./imap/db/__db.002: FAILED
./imap/db/__db.003: FAILED
./imap/db/__db.004: FAILED
./imap/db/__db.005: FAILED

or like so:

md5sum: WARNING: 9 of 72522 computed checksums did NOT match
md5sum check failed
./amavis/db/__db.001: FAILED
./amavis/db/__db.002: FAILED
./amavis/db/__db.003: FAILED
./amavis/quarantine/spam-74v2QeUbeNRR.gz: FAILED
./imap/db/__db.001: FAILED
./imap/db/__db.002: FAILED
./imap/db/__db.003: FAILED
./imap/db/__db.004: FAILED
./imap/db/__db.005: FAILED

My first attempt to solve this was to stop the daemons that write these 
files. I stopped amavisd-new and cyrus, took the lvm snapshot and 
restarted the services. Then I did the backup. But the verify was still 
failing.

I asked myself why only that few files out of more than 70000 are failing 
the test. I took a closer look and compared the first file manually. The 
size was equal but "cmp" said that the files were differing in the first 
byte.

Then I thought it was a kind of a file system caching issue and I put 
some "sync" commands after stopping the daemons and before taking the lvm 
snapshot. The result was that only two files were failing the md5sum 
check. However two files failing are two files too much.

Somehow I found the xfs_freeze command and I thought that it must be the 
solution of my problems. Unfortunately I locked the mailserver ;) and had 
to hard reset it, because lvcreate didn't give me the prompt back. 
Hitting CTRL-C neither.

>From the archive I have learned that xfs_freeze shouldn't be necessary 
with lvm2 anymore. Also I have read something about versioning problems. 
I posted mine above, so maybe it's a version issue?

If not, I also read about the dmsetup command. I played around a little 
with dmsetup and found a working combination of commands. Bit I am not 
sure if this is the way one can do this.

dmsetup suspend vg0-var
xfs_freeze -f /var
dmsetup resume vg0-var
lvcreate -s -L 1G -n snapvar /dev/vg0/var
xfs_freeze -u /var
# do backup from the snapshot
lvremove -f /dev/vg0/snapvar

To me it doesn't sound good issueing these commands. Just a feeling ;).

Does there remain any other source of the above md5sum errors? I believe 
not, because the other 70,000 files were checked and proved to be right. 
Any hints?


TIA and kind regards,

Timo



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]