[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Interesting anaconda/yum performance issue

On Wed, 2007-06-13 at 15:56 -0400, Jeremy Katz wrote:
> On Wed, 2007-06-13 at 13:40 -0600, Jeffrey Law wrote:
> > Using cProfile, it appears that we're calling buildPkgRefDict for each
> > explicitly listed package -- at a cost of nearly a half-second per call
> > (3GigHz P4).  Clearly this gets to be rather expensive when the package
> > list is long -- a typical install is over 600 packages.  We're burning
> > an absurd amount of time here.
> > 
> > Using @group syntax does not suffer from this problem.  So clearly
> > there's a path through anaconda which does not need to call 
> > buildPkgRefDict so often.
> The difference is that the comps file isn't allowed to do anything more
> than list an exact package name.  Listing packages in %packages is
> allowed to be globs, specify version, specify, arch, etc. 
Understood -- and I certainly need to fully specify versioning
information and the like (though I don't use globbing).

> Although I'm not quite sure why we're not using the matchPackageNames()
> bits in yum's install() method... it should provide the same sort of
> results but also be able to do some of the querying using sql queries
> against the sqlite db (and thus probably be faster)
Good question (took me a few minutes to find a user of matchPackageNames
but eventually I found it).  Yes, it looks like they provide the same
sort of information.  Something like this?

*** __init__.py	2007-06-13 14:28:52.000000000 -0600
--- __init__.py.NEW	2007-06-13 14:26:19.000000000 -0600
*************** class YumBase(depsolve.Depsolve):
*** 1756,1762 ****
              if kwargs.has_key('pattern'):
                  exactmatch, matched, unmatched = \
parsePackages(self.pkgSack.returnPackages(),[kwargs['pattern']] ,
--- 1756,1762 ----
              if kwargs.has_key('pattern'):
                  exactmatch, matched, unmatched = \
! 		    self.pkgSack.matchPackageNames([kwargs['pattern']])

I'm not sure what to do with the casematch argument or if the other
calls to parsePackages ought to be changed too.  And I'm certainly
not well-versed in python, anaconda or yum to know if there are other
issues I'm not dealing with.

For a small package set (~350) packages that little tweak takes us
from 9:20 to 7:34 (wall clock, start to finish).  For reference using
@groups to install the exact same packages is 6:52.   So the change
makes explicit packages almost competitive with @groups.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]