[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: Duplicated files in the pristine FC4t2 installation
- From: Roland McGrath <roland redhat com>
- To: pjones redhat com, Development discussions related to Fedora Core <fedora-devel-list redhat com>
- Cc: os-devel-list redhat com, "Mike A. Harris" <mharris redhat com>, List for Fedora Package Maintainers <fedora-maintainers redhat com>
- Subject: Re: Duplicated files in the pristine FC4t2 installation
- Date: Mon, 2 May 2005 13:27:08 -0700
> But I think the whole problem is silly as well, FWIW.
When Warren brought this up on IRC a while back, I wrote the following
script and rand it on a rawhide everything install. This fails to take
into account files that are already hardlinked, so and its results might
well be significantly inflated. (Someone who cares could hack it further
to check installed names of a duplicate file for being the same inode.)
Total 408578931 bytes in 43107 inodes
That's a max of < 400M on an install that is something 8.5-9G.
So the issue is worth at most on the order of 5% of disk space,
and that is probably a very high estimate.
rpm -qa --qf '[%{FILEMD5S} %{FILENAMES} %{FILESIZES} %{SOURCERPM}\n]' |
awk '
NF < 4 { next } # directory
{
md5_name[$1] = $2;
md5_srpm[$1] = $4;
info = $2 " " $4;
if ($1 in sizes) {
if ($3 != sizes[$1]) print "!!!", $1 ":", info, "VS", md5[info]
} else {
sizes[$1] = $3;
}
if ($1 in md5) {
if (info == md5[$1]) next;
for (i = 1; i < dups[$1]; ++i)
if (dupinfo[$1 "," i] == info)
next;
dups[$1]++;
dupinfo[$1 "," dups[$1]] = info;
} else {
md5[$1] = info;
}
}
END {
dupsize = dupcount = 0;
for (sum in dups) {
n = dups[sum];
dupcount += n;
dupsize += n * sizes[sum];
print n, "dups:", sum, " ==> ", (n * sizes[sum]);
print "\t" md5[sum];
for (i = 1; i <= n; ++i)
print "\t" dupinfo[sum "," i];
}
print "Total", dupsize, "bytes in", dupcount, "inodes";
}
'
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]