file-copy corruption

T. Horsnell tsh at mrc-lmb.cam.ac.uk
Thu Jun 29 13:41:04 UTC 2006


>Hi Terry,
>
>I tend to disagree with the other who have replied so far, I've found 
>NFS to be 100% reliable for many years, with large clusters of clients 
>using many flavors of Unix, Whenever things have failed I've always 
>being able to find the root cause.
>
>I'd suggest that you look are your messages file for indications of the 
>problem. Also one tool you can .use is nfsstat (man nfsstat) it should 
>indicate NFS related bad calls.
>
>On any recent Linux, it would be very rare for there to be "no 
>indication", so your log files are your friend.
>
>If you really cannot find any message or indication, it stands to reason 
>that the files in question may have been open/updated by another user or 
>process during the gzip process, is that possible ?
>
>I would agree that Rsync is a good choice for this task (you could run 
>"rsync --dry-run --stats" to show any differences) that exist.

OK, I'm convinced about using rsync to build the filesystem copy.
gnu-tar over NFS seems to take about 50% longer.
However, as far as I can tell, rsync uses the file-modification time
to determine whether a source and destination file are possibly different
(or the file length if --size-only is selected), and only if these
indicate that the files may be different does it start to look at the
differences. Yes/no?


Cheers,
Terry


>
>Albert.
>
>T. Horsnell wrote:
>> I'm in the process of moving stuff from our Alpha fileserver
>> onto A linux replacement. I've been using gnu-tar to copy filesystems
>> from the Alpha to to the Linux NFS-exported disks over a 1Gbit LAN,
>> followed by diff -r to check that they have copied correctly (I wish
>> diff had an option to not follow symlinks..). I've so far transferred
>> about 3 TiB of data (spread over several weeks) and am concerned
>> that during this process, 3 files were mis-copied without any
>> apparent hardware-errors being flagged. There was nothing unusual
>> about these files, and re-copying them (with cp) fixed the problem.
>>
>> Are occasional undetected errors like this to be expected?
>> I thought there were sufficient stages of checksumming/parity 
>> (both boxes have ECC memory) etc to render the probability
>> of this to be vanishingly small.
>>
>> On all 3 files, multiple retries of the diff still resulted
>> in a compare error, which was then fixed by a re-copy. This
>> suggests that the problem occurs during the 'gtar' phase, rather
>> than the 'diff -r' phase.
>>
>> Does anyone know of a network-exercise utility I can use
>> to check the LAN component of the data-path?
>>
>> Cheers,
>> Terry.
>>
>>   
>
>-- 
>fedora-list mailing list
>fedora-list at redhat.com
>To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
>




More information about the fedora-list mailing list