[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: All installs, and then all erases...

On Thu, Jun 05, 2003 at 01:34:06PM -0700, Barry K. Nathan wrote:
> On Thu, Jun 05, 2003 at 06:43:54AM -0400, Jeff Johnson wrote:
> [quote rewrapped to fit within 80 columns]
> > Having all erased packages at the end of the transaction can/will be a
> > performance win because there is a point (usually about 1/3 of the way
> > through) where all prerequisite conditions are known to be satisfied.
> > That means that there are no further constrainst on install order, they
> > can be installed in any order whatsoever, even in parallel.
> Whether or not it's a potential performance win, it's a disk usage
> problem -- this causes RPM to need more temporary disk space, and
> that's often enough to make hosts with small disks fail to install
> updates. Manually splitting things up into multiple invcations of RPM
> reduces the temporary disk space usage to the point where the packages
> can be installed successfully.

Um, maybe, shrug.

Sure there's an increase in space used because of all-install before
all-erase, adding-before-subtracting has a higher maximum value than
subtracting-before-adding, duh.

However, on upgrade, files are usually replaced, not installed, so
the delta is often surprisingly modest.

OTOH, install-before-erase (and installing to temp file which is
renamed into place, another waste of temp space) is safer with
shared libraries, which might be opened by path while an upgrade
was being performed.

And I'm willing to bet that there will be as much as a 25% speedup
by installing packages in parallel. I've already seen a 7% speedup
using a whopping big buffer in zlib and 3 madvise calls that do
read ahead and write behind. That means that there is unused I/O
available in modern linux kernels on reasonably fast hardware.

And no, I'm not claiming that rpm on i486 processors will be any faster
(or slower) than already, but that isn't the typical cpu anymore.

> This is already filed in Bugzilla, too (bug 86401):
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=86401

#86401 is a whole different problem, glibc-common has 60-120 Mb of
hard-linked (and %lang locale sensitive) files, and rpm does not
compute space used for hard-linked files correctly.

aside: Getting the disk accounting correct is harder because of the
%lang install policies. Chasing all members of a hard-link set
down to identify if any one of them is going to be installed would
add Yet Another Loop underneath 4 nested loops, hardly worth the
effort for a single (albeit important) glibc-common package that
wishes to use lots and lots of hard links.

> Theoretically speaking, is this something that should ideally be fixed
> in RPM, or should users of RPM (such as up2date) manually split updates
> into multiple transactions if there's not enough disk space to do it in
> one go?

There's certainly nothing stopping you from using multiple transactions
if you wish or need to.

No matter what, rpm can never get the disk accounting more correct
than ~1% because
	a) root reserved space is estimated by (compiling in) assuming 5%,
	and there's no general API to query for root reserved space because,
	well, only a few file systems have root reserved space.
	b) there is no effort whatsoever to calculate database and
	temp file needs.
	c) scriptlet side effects are entirely opaque to rpm.

And It Really Doesn't Matter much when disks are as big as they are.

73 de Jeff

Jeff Johnson	ARS N3NPQ
jbj@redhat.com (jbj@jbj.org)
Chapel Hill, NC

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index] []