Files corrupt on copy

Roberto Ragusa mail at robertoragusa.it
Sun Jun 28 17:22:03 UTC 2009


Andy Campbell wrote:

> Hmm always out by 1 ...
> 
> [trantor] ..mp2/tmp/src $while true
> while> do
> while> cp file1.zip new.zip ; cmp -l  file1.zip new.zip
> while> done
> 107757013 271 270
> 109383125 206 207
>  85093653 373 372
>  77726613 206 207
>  85093653 373 372
>  85899797 373 372
>  38258517 373 372
> 109383125 206 207
> 107757013 271 270
> 109383125 206 207
> 142459477 126 127
> 107757013 271 270
>  40550997 171 170
>  85093653 373 372
> 107757013 271 270
> 110261013 371 370
>  62581653 371 370
> 109383125 206 207
> 110526037  71  70
> 109383125 206 207
>  40550997 171 170
>  77726613 206 207
> 109383125 206 207
> 107757013 271 270

The error is always in the last bit.
And there is also a strong similarity on the final part of
the position error, if expressed in hexadecimal.

If you paste your numbers into this command:

  $ while read a b; do printf %08x"\n" $a; done

you get this:

066c3dd5
06850dd5
05126d15
04a20395
05126d15
051eba15
0247c755
06850dd5
066c3dd5
06850dd5
087dc255
066c3dd5
026ac255
05126d15
066c3dd5
06927315
03baeb95
06850dd5
06967e55
06850dd5
026ac255
04a20395
06850dd5
066c3dd5

The errors are all of kind: ......X5 where x=1,5,9,d.

So the errors appear only at spots with a distance of 16*4=64 bytes.

Strong suspicion on your hardware: CPU (defective L2 cache line?), chipset
or memory (did you try one 4 GiB and then the other 4GiB stick?)

Oh, hardware problems could also be caused by a defective power supply
or... a motherboard not well screwed on the chassis (this one made me mad
some years ago).
You could also try something related to power/speed management.

The only alternative is a kernel bug, but as it is touching only one bit
it is not likely.

Best regards.
-- 
   Roberto Ragusa    mail at robertoragusa.it




More information about the fedora-list mailing list