Finding Duplicate Files
Will Partain
will.partain at verilab.com
Fri Mar 14 20:21:13 UTC 2008
"Jonathan Roberts" <jonathan.roberts.uk at googlemail.com> writes:
> I have several folders each approx 10-20 Gb in size. Each has some
> unique material and some duplicate material, and it's even possible
> there's duplicate material in sub-folders too. How can I consolidate
> all of this into a single folder so that I can easily move the backup
> onto different mediums, and get back some disk space!?
An rsync-y solution not yet mentioned is to copy each dir 'd' to
'd.tidied' while giving a --compare-dest=... flag for each of the
_other_ dirs. 'd.tidied' will end up stuff unique to 'd'. You
can then combine them all with 'tar' or 'cp' or whatever.
You could use the 'sha1sum' tool to test that files in the
*.tidied dirs really are unique.
This technique will catch identical files with like names, e.g.
d1/foo/bar/wibble
d2/foo/bar/wibble
but not
d1/foo/bar/wibble
d2/bar/wibble/foo/wobble
(if that makes sense). rsync --compare-dest and --link-dest : fantastic.
Will
More information about the fedora-list
mailing list