[Fedora-packaging] Duplicate files

Richard W.M. Jones rjones at redhat.com
Sat Nov 14 15:40:24 UTC 2009


We should run a program that dedupes files in /usr/share/doc,
automaticallyq replacing identical files with hard links[0].  There
are a number of these sorts of programs, "fdupes" being one that we
have in Fedora.

A quick calculation running "fdupes -r /usr/share/doc" on my Fedora 12
desktop machine, and some analysis:

 * 3484 files could be replaced by hard links (that count doesn't
   include the single remaining copy of the file).

 * Total files in /usr/share/doc is 31887, so that is 11% of all files
   in that directory.

 * Deduping would save 38896 (1K blocks) out of 473748 blocks used
   (about 8%).  [1]

By no means all the duplicates in /usr/share/doc are just license
files.  Many other types of file are also duplicated, including many
images.

Rich.

[0] One day filesystems will do this for us automatically and
transparently ...

[1] Assumption: storing the directory entry and inode is free.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v




More information about the Fedora-packaging mailing list