Better repodata performance (was: redhat abe)

seth vidal skvidal at phy.duke.edu
Sat Jan 29 22:07:00 UTC 2005


> For N packages the ballanced load are log_2 N bins. Adding M packages
> touches only log_2 M bins. And the bins have a max size of 2^i
> packages where i goes from 0 to N-1. And the good news is you touch
> the bins with i < M, e.g. the small ones.
> 
> The statistical net effect is that for M package additions to
> arbitrary N you get log_2 M downloads of a total of 2M packages.
> 
> In relevant numbers:
> 
> o N~=4000, log_2 N~=12
>   You have 12 bins.
> o 10 security/bug fix updates, (statistically) only bins 0 to 4 are
>   changed amounting to 32 packages.
>   Clients download only 5 files worth of 32 packages in size.
> 
> Compare with the current situation, where you need to get the whole
> lot of N packages for each update.
> 
> For this to work you need to

let's be clear - for this to work YOU need to.


> o introduce package cancelation (anti-packages ;)

fat chance.

> o introduce multiple repodata components

which buys us not all that much other than complexity of debugging.


> o keep a manifest of the last state and feed the repo creation system
>   with the differences (packages lost, packages gained).

And how do you feed the repo creation system this data? Where do you get
it to begin with? The only way you know this information is if you
already have it - the only way you have it is if you checked all the
packages for what has changed. Are you beginning to see the loop here?

As Jeremy recently reminded me - incremental updates to metadata was
done a looooooooooong time ago in 'yup'. And it was a mess to keep up
with. Not to mention just added cruft on the repo side.

But far be it from to halt the steady march of progress - when you get a
chance to implement this stuff let me know.

Oh and once more - who is it gets the benefit from all this work?
It sounds like it's mostly repo maintainers - not the users.

If someone wants to combine createrepo and yum-arch into one program so
it makes both at the same time that's fine - it's about an hour or two
worth of work, what you're describing above is considerably more, not to
mention redesigning the depsolvers to deal with the new repository
format.

-sv





More information about the fedora-devel-list mailing list