Supporting EPEL Builds in Koji

Mike McLean mikem at redhat.com
Fri Jul 18 15:38:28 UTC 2008


Mike Bonnet wrote:
> On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
>> If the remote_repo_url data is going to be inherited (and I tend to
>> think it should be), then I think it should be in a separate table. I'd
>> like to reserve tag_config for data that is local to individual tags.
>> This will also make it easier to represent multiple remote repos.
> 
> I don't have any problem with this, though it does mean we'll need to
> duplicate quite a bit of the inheritance-walking code, or make it
> configurable as to which inheritance it's walking.  This new table would
> also have to be versioned, the same way the tag_config table is.

Walking inheritance is just a matter of determining the inheritance 
order and scanning data on the parent tags in sequence. Currently, 
nothing scans tag_config in this way because no data in tag_config is 
inherited. (Well, in a sense tag_changed_since_event() does walk 
tag_config, but that's a little different.)

We need to figure out how we'll deal with multiplicity for the external 
repos. If tag A uses repo X and inherits from tag B which uses repo Y, 
then does tag A use both X and Y, or does the X entry override it?
A (+repo X)
  +- B (+repo Y)

My inclination is that it should override, because I think we'll want 
some way to do override that that mechanism seems easiest.

Also, I think we'll probably want to allow multiple external repos per 
tag, something which will be much easier to represent in an external 
table. We can include an explicit priority field to make a sane 
uniqueness condition (and to provide a clear ordering for the repo merge).

> The big win here is that the methods and tools that query rpminfo for
> information about what was present in the buildroot at build time
-snip-

I see all that, and I'm almost convinced. The flipside is that by 
default all the code will treat these external rpms the same as the 
local ones, which will not be correct for a number of cases. Obviously, 
part of this will involve changing code to behave differently for the 
external ones, I'm just worried about how much we might have to change, 
or what we might miss.

> Yes, I realize that the "not null" constraint should exist now, and in
> fact all rpms in the Fedora database do reference builds.  However, I
> think logically having a remote rpm not reference a local build makes
> sense.  The alternative is to create the build object from the srpm info
> in the repodata (along with some namespacing similar to rpminfo).
> However, this would significantly clutter the build table with
> information that is pretty non-essential.

The idea of grouping them into builds appeals to me, but I don't think 
it's possible in general (though maybe we could fake it well enough 
somehow). The only data we're (mostly) guaranteed to have to work with 
is the sourcerpm header field. The catch is that in case of an 
nvr-collision we can't determine which build it belongs to (or indeed if 
we should create a new build of same nvr).

> I'm open to suggestions on how to modify the uniqueness constraint to
> handle this case.  We care about ensuring that a locally-built rpm
> doesn't have the same n-v-r as another locally-built rpm.  I don't think
> we care at all about n-v-r uniqueness amongst remote rpms.  However, we
> probably want to avoid creating 2 rpminfo entries when the same remote
> rpm is used in 2 different buildroots.  Using the sigmd5 is a good way
> to avoid that.

Agreed. same sigmd5 ==> same rpm.

>  However, what happens if a remote rpm with the same
> n-v-r and sigmd5 gets pulled in from 2 different remote repos?

This gets into part of what bugs me about this and why I'm somewhat 
inclined to keep the ext repo data a step removed. It's so potentially 
dirty. Koji has all these consistency constraints that an external repo 
(much less many of them in aggregate) lacks.

It's quite possible that an external repo might respin a package keeping 
the same nvr, so we don't even need 2 external repos to hit this 
possibility.

> Perhaps
> the "origin" field should be pushed down to the buildroot_listing table,
> so the buildroots can reference the same rpminfo object, but indicate
> that it came from a different repo in each buildroot?

Interesting. Yeah, I think that is is probably the right answer.

Also, I'm thinking we need to have some sort of rpm_origin table so that 
all these references can be managed cleanly.

> Also, what happens when we find 2 remote rpms with the same n-v-r but
> different sigmd5s?  Should that be an error?

Certainly we have to allow the possibility that two origins might have 
overlapping nvras. Within a single origin, I'm not so sure. I suppose we 
can get away with some small consistency demands. As long as we're only 
enforcing unique nvra for local builds and indexing by sigmd5/similar, I 
don't think we /have/ to make this an error condition.

In the same vein, what happens when an external repo has an nvra+sigmd5 
matching a /local/ rpm?  Maybe it doesn't matter, though I guess 
technically we want to record the origin properly when it gets into a 
buildroot via external repo vs internal tag.

>> First, I'd like to be able to support external koji servers (or rather a 
...
> I agree that this is a desirable goal.  I believe this is more the
> domain of the Koji secondary-arch daemon.  It would be talking directly

Well, it has some similarities to 2nd arch, but still quite different.

The more I think about it, the more I think that supporting an external 
koji server will probably be much different from from the ext repo 
business. Most of the issues with rpminfo will carry over, but with a 
koji server we will be able to determine build data and can probably 
actually pull off something like "inherit from tag X on koji server Y."

> The tag content may be managed by build, but when it's time for it to
> actually get used (in the form of a yum repo) it gets unfolded into a
> big list of rpms.  And what gets associated with a buildroot is simply a
> big list of rpms.  Conceptually I don't really have a problem with the
> idea of a tag as a big list of rpms, that we happen to group by srpm
> within Koji because it's more convenient for us.  So adding the external
> repo information to tag_config is just an extension of the big list of
> rpms model.

Yeah, I almost wish I hadn't made the build structure quite the way I did.

> However, we will already be parsing the remote repodata, which contains
> information like the srpm name for each rpm, so we could do something
> more sophisticated here.
-snipsnip-
...
> The repomerge tool seems like it solves the problem better, and would be
> more useful in general.

If we're going to have our fingers in the repodata, we'll probably want 
to have them in the merge too. Perhaps we can get createrepo and/or this 
repomerge tool usefully libified?




More information about the Fedora-buildsys-list mailing list