[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Better repodata performance (was: redhat abe)

On Sat, Jan 29, 2005 at 02:36:09AM -0500, seth vidal wrote:
> > The exercise is to attempt a method in which you save computation of md5 
> > or sha1, as these are one of the time consuming steps of createrepo.  
> > The save would be in a 100k package repository: (100,000 - N) * 
> > Time(sum_calc), where N equals the number of packages that *need* to 
> > generate sums for. A parameterized list of package names passed into 
> > createrepo would be sufficient to figure out what composes the N list.  
> > An external process, such as a Manifest list, would then be used to 
> > mitigate a set of packages through the entire build process.  Apt uses 
> > a md5sum cache, but having fine-tuned controlled of the process would 
> > be more stable and directed. This is how much saving you'd get for #2.
> Let me know when you've figured it out but as it stands I don't think
> incrementally updating the metadata is very feasible.

How about having multiple repodatas, the base one and small
incremental ones, the incremental ones containing also package
cancelations? As a side effect this would also reduce download
bandwidth and thus make even clients/users happy (not only repo

The base repodata and the incremental ones would be merged from time
to time, best with a binary load algorithm as done in large sum
statistics (for 100K packages you would need only 17 files).
Axel.Thimm at ATrpms.net

Attachment: pgp00210.pgp
Description: PGP signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]