[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Back Again



Sam Varshavchik wrote:
> I'd break it down as about 70% yum vs. 30% rpm.  Yum is really
> taking its sweet time figuring out what it needs to do. But even
> after it's done that, and downloaded everything, rpm still tends to
> spin its wheels, if it has a large list of packages to chew through.

Okay.  That sounds reasonable.  It was my (armchair) impression that
it was mostly yum.  So when I say it sounds reasonable, what I really
mean is that it agrees with my own bias. ;)

> You do /not/ need that much info in the first step. All you need is
> a just a list of names of packages available on the remote
> repository. You reconcile that against the list of packages you
> already have downloaded the metadata for, and you then know what's
> new.

I agree that the amount of data downloaded could be less.  What I
meant was that at some point you have to chew on a large chunk of
data to do the depsolving.  I'm surprised that it was done in python
for as long as it was.  That seems like a task much more suited to a
compiled language.  I didn't do any speed tests to compare yum before
an after the metadata parser was rewritten in C.

> Meanwhile, primary.xml.gz is actually a voluminous XML file that
> contains not just each package's name and version, but also all
> sorts of extra info.  And you have to download the whole thing every
> time. And, the current version of yum, sqlite-based, does not help.
> I see that primary.sqlite.bz2 is about twice as large as
> primary.xml.gz.
>
> So, all this talk of a database-based yum, and it turns out that you
> end up having to download /twice/ as much data as you used to
> before? Someone explain to me what we're supposed to be doing here.

Yeah, that's why there's interest in Presto.  I *think* that it uses
deltas for the metadata as well as the packages, but I'm really not
sure of that.  With the size of the metadata, it would be quite an
improvement if it did.

> From what I see yum is doing, it download the primary, the other
> file, and possibly filelists, /every/ time a single package gets
> added to the repository. Even though 99% of the content is the same
> as before.
>
> This, in my opinion, does not really such an optimum design to me.
> You should /not/ have to download /everything/ every time a single
> package changes.

Agreed.  I rsync things nightly, so it's always local for me and I
don't spend much time worrying about it.  But there is a lot of room
for improvement.  I'm sure there aren't enough people interested in
doing the hard work to make it happen though, so improvement will be
slow.

> Ditto for the epoch hack -- my solution fixes the original
> underlying reason for having an epoch in the first place.

Eh?  So how do you handle the sometimes retarded versioning schemes of
upstream sources?  Or the occasional need to push an older version of
something as an update?

> Well, I can point them to how HTTP 1.1 chunking works, and how to
> gracefully autodetect if the HTTP server supports HTTP 1.1 chunking,
> and the logic to gracefully fall back to "Plan B", if the
> repository's HTTP server is running old Apache without HTTP 1.1
> support, and what to do next. That's about all I can do. I won't
> write the code, I have plenty of other coding work that keeps me
> busy.

If you have some time, writing up some of your ideas and how they
could be worked into the current infrastructure would seem like a nice
way to help out.  Any sort of volunteer effort usually suffers from
lack of manpower more than anything else.

> It's not that trademarked logos must be kept in one package. It's
> just that the package, for some reason that I still can't fathom,
> must depend on gtk2 code libraries. Why would a package that
> supposedly contain nothing more than a bunch of logo image files,
> have a needed dependency on a package that contains system
> libraries? That just does not compute.

It was due to the package guideline of not having unowned directories.
The dep chain needed to pull in the packages that owned the
directories it was adding files to.  The fix in rawhide was to simply
have fedora-logos own those directories as well as the redhat-artwork
package.

> Although this does not have any direct relevance to the overall
> issue of rpm's design, it is demonstrative, though, of the same kind
> of inefficient non-attention to detals.

It wasn't that this wasn't known.  It was that there are different
goals and policies at work.  Sometimes that causes a conflict.  Is it
more important to not have unowned directories on the system or to
have a super small install?  Different people have different answers
to that question.

A lot of people moaned about this one, but not so many proposed an
acceptable solution.  Sadly, that's what happens far too often.

-- 
Todd        OpenPGP -> KeyID: 0xBEAF0CE3 | URL: www.pobox.com/~tmz/pgp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I got stopped by a cop the other day.  He said, "Why'd you run that
stop sign?"  I said, "Because I don't believe everything I read."
    -- Stephen Wright

Attachment: pgpcaixdCchxG.pgp
Description: PGP signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]