[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Better repodata performance (was: redhat abe)



On Sat, Jan 29, 2005 at 02:23:25PM -0500, seth vidal wrote:
> On Sat, 2005-01-29 at 17:11 -0200, Alexandre Oliva wrote:
> > On Jan 29, 2005, Axel Thimm <Axel Thimm ATrpms net> wrote:
> > > How about having multiple repodatas, the base one and small
> > > incremental ones, the incremental ones containing also package
> > > cancelations? As a side effect this would also reduce download
> > > bandwidth and thus make even clients/users happy (not only repo
> > > maintainers).
> > 
> > +1.  Heck, make it +2.
> > 
> > I agree with Axel, for a change :-)
> > 
> 
> How would it reduce bandwidth - you'd have to download and parse
> multiple entries and you'd STILL have to do just a much work on the
> repo-side b/c you'd have to check all the packages for changes.

For N packages the ballanced load are log_2 N bins. Adding M packages
touches only log_2 M bins. And the bins have a max size of 2^i
packages where i goes from 0 to N-1. And the good news is you touch
the bins with i < M, e.g. the small ones.

The statistical net effect is that for M package additions to
arbitrary N you get log_2 M downloads of a total of 2M packages.

In relevant numbers:

o N~=4000, log_2 N~=12
  You have 12 bins.
o 10 security/bug fix updates, (statistically) only bins 0 to 4 are
  changed amounting to 32 packages.
  Clients download only 5 files worth of 32 packages in size.

Compare with the current situation, where you need to get the whole
lot of N packages for each update.

For this to work you need to

o introduce package cancelation (anti-packages ;)
o introduce multiple repodata components
o keep a manifest of the last state and feed the repo creation system
  with the differences (packages lost, packages gained).

It's rather common and very efficient in high performance statistics
of large sums.
-- 
Axel.Thimm at ATrpms.net

Attachment: pgp00212.pgp
Description: PGP signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]