file-copy corruption

T. Horsnell tsh at mrc-lmb.cam.ac.uk
Wed Jun 28 17:21:28 UTC 2006


Thanks for this. I too am heavily into NFS, and have been using it
very happily on big fileservers for years. The Linux box (RHEL4) to which
I am moving files, is becoming a replacement fileserver for the Alpha
on the fly, (I'm relocating filesystems one by one as space/time permit)
and hence the choice of doing stuff over NFS.

nfsstat does show that there have indeed been a few TX and RX errors and collisions
but I've always seen odd ones of these over the years and have generally
assumed that like disk errors these are retried, and if successful cause no damage.
If unsuccessful, I would have thought they would generate some kind
of I/O error, with a suitable error message to the user (or stdout or syslogd).
Maybe this assumption is wrong. There's nothing untoward in /var/log/messages.

I guess the cause could lie anywhere between the source disk, the source disk controller,
the source memory, the network, the destination memory, the destination disk controller
and the destination disk, but my initial thought was that network is most likely.

Cheers,
Terry.


>Hi Terry,
>
>I tend to disagree with the other who have replied so far, I've found 
>NFS to be 100% reliable for many years, with large clusters of clients 
>using many flavors of Unix, Whenever things have failed I've always 
>being able to find the root cause.
>
>I'd suggest that you look are your messages file for indications of the 
>problem. Also one tool you can .use is nfsstat (man nfsstat) it should 
>indicate NFS related bad calls.
>
>On any recent Linux, it would be very rare for there to be "no 
>indication", so your log files are your friend.
>
>If you really cannot find any message or indication, it stands to reason 
>that the files in question may have been open/updated by another user or 
>process during the gzip process, is that possible ?
>
>I would agree that Rsync is a good choice for this task (you could run 
>"rsync --dry-run --stats" to show any differences) that exist.
>
>Albert.
>
>T. Horsnell wrote:
>> I'm in the process of moving stuff from our Alpha fileserver
>> onto A linux replacement. I've been using gnu-tar to copy filesystems
>> from the Alpha to to the Linux NFS-exported disks over a 1Gbit LAN,
>> followed by diff -r to check that they have copied correctly (I wish
>> diff had an option to not follow symlinks..). I've so far transferred
>> about 3 TiB of data (spread over several weeks) and am concerned
>> that during this process, 3 files were mis-copied without any
>> apparent hardware-errors being flagged. There was nothing unusual
>> about these files, and re-copying them (with cp) fixed the problem.
>>
>> Are occasional undetected errors like this to be expected?
>> I thought there were sufficient stages of checksumming/parity 
>> (both boxes have ECC memory) etc to render the probability
>> of this to be vanishingly small.
>>
>> On all 3 files, multiple retries of the diff still resulted
>> in a compare error, which was then fixed by a re-copy. This
>> suggests that the problem occurs during the 'gtar' phase, rather
>> than the 'diff -r' phase.
>>
>> Does anyone know of a network-exercise utility I can use
>> to check the LAN component of the data-path?
>>
>> Cheers,
>> Terry.
>>
>>   
>
>-- 
>fedora-list mailing list
>fedora-list at redhat.com
>To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
>




More information about the fedora-list mailing list