Supporting EPEL Builds in Koji

Thu Jul 17 22:48:34 UTC 2008

On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
> Mike Bonnet wrote:
> > http://fedoraproject.org/wiki/Koji/EPELSupport
> 
> This is mostly in line with what I've been thinking. I do have a few
> comments/concerns thought...
> 
> If the remote_repo_url data is going to be inherited (and I tend to
> think it should be), then I think it should be in a separate table. I'd
> like to reserve tag_config for data that is local to individual tags.
> This will also make it easier to represent multiple remote repos.

I don't have any problem with this, though it does mean we'll need to
duplicate quite a bit of the inheritance-walking code, or make it
configurable as to which inheritance it's walking.  This new table would
also have to be versioned, the same way the tag_config table is.

> I'm a little concerned about using the rpminfo table. Yes, I know it
> seems wasteful to introduce another table to track very similar data,
> but these remote rpms really are differently tracked and handled than 
> the local ones.

The big win here is that the methods and tools that query rpminfo for
information about what was present in the buildroot at build time
wouldn't have to change, or only change slightly.  With minor
modification the web UI can continue to show a list of all packages in a
buildroot, along with a flag indicating if they were local or remote.
The buildroot_listing table would not have to change at all.  The
majority of XML-RPC calls that interact with the rpminfo or
buildroot_listing tables would only need minor modifications.  Adding a
new table to track remote rpms metadata and which remote rpms end up in
a buildroot would add significant effort to this proposal.  Also, I
think it's more semantically correct to have a single place where we
track rpm metadata and buildroot contents, regardless of where they came
from.

> Also, I'm not sure how I feel about having rpminfo entries will null 
> build_id. Sure, technically the field lacks the 'not null' constraint, 
> but that is more of an oversight.

Yes, I realize that the "not null" constraint should exist now, and in
fact all rpms in the Fedora database do reference builds.  However, I
think logically having a remote rpm not reference a local build makes
sense.  The alternative is to create the build object from the srpm info
in the repodata (along with some namespacing similar to rpminfo).
However, this would significantly clutter the build table with
information that is pretty non-essential.

> Note, I'm not outright rejecting the idea of using rpminfo this way, but 
> I am concerned.
> 
> 
> As for the origin field. I think we should track where these external 
> rpms come from, but I'm not sure about including in the uniqueness 
> constraint. I'm not sure that the value of that field is sufficiently 
> well defined (or canonicalizable) for such use. I'd rather see the 
> sigmd5 value (or some abstracting sighash field) used as a unique index.

I'm open to suggestions on how to modify the uniqueness constraint to
handle this case.  We care about ensuring that a locally-built rpm
doesn't have the same n-v-r as another locally-built rpm.  I don't think
we care at all about n-v-r uniqueness amongst remote rpms.  However, we
probably want to avoid creating 2 rpminfo entries when the same remote
rpm is used in 2 different buildroots.  Using the sigmd5 is a good way
to avoid that.  However, what happens if a remote rpm with the same
n-v-r and sigmd5 gets pulled in from 2 different remote repos?  Perhaps
the "origin" field should be pushed down to the buildroot_listing table,
so the buildroots can reference the same rpminfo object, but indicate
that it came from a different repo in each buildroot?

Also, what happens when we find 2 remote rpms with the same n-v-r but
different sigmd5s?  Should that be an error?

> Following are additional ideas relating to this feature. They are 
> perhaps a bit ambitious for the short term, but I'd at least like to 
> keep them in mind with the initial design so we don't paint ourselves 
> into a corner.
> 
> First, I'd like to be able to support external koji servers (or rather a 
> target or tag from an external koji server) in addition to external 
> repos. Some of the ideas are the same, however an external koji server 
> provides more information and more structure.

I agree that this is a desirable goal.  I believe this is more the
domain of the Koji secondary-arch daemon.  It would be talking directly
to an "upstream" Koji server, analyzing what it's doing, and applying
some logic to decide what builds to import or replicate, and where/how
to do it.  This proposal has the much more modest goal of simply
consuming static external repos, and is more appropriate for the EPEL
and private-standalone-Koji case.

> Second, I'm fond of having a tag /represent/ some external repo/whatever 
> and having the normal inheritance mechanism take care of priority. The 
> trick here is that Koji tag content is by build, but it will be tricky 
> to correctly determine build structure for external rpms -- indeed, 
> external repos might include subpackages from different versions of the 
> same build (the an external koji server would not, at least for its 
> local content). So this will probably be difficult, but if we could 
> manage something like this, I'd feel a lot better about using the 
> rpminfo table.
> 
> Doing something like this would most likely require Koji to comprehend 
> the external repos instead of just passing them off to a repomerge tool.

The tag content may be managed by build, but when it's time for it to
actually get used (in the form of a yum repo) it gets unfolded into a
big list of rpms.  And what gets associated with a buildroot is simply a
big list of rpms.  Conceptually I don't really have a problem with the
idea of a tag as a big list of rpms, that we happen to group by srpm
within Koji because it's more convenient for us.  So adding the external
repo information to tag_config is just an extension of the big list of
rpms model.

However, we will already be parsing the remote repodata, which contains
information like the srpm name for each rpm, so we could do something
more sophisticated here.

> Third, we may not want to use a repomerge tool. The yum-priorities 
> plugin might serve just as well, and allow us to specify some different 
> yum repo options per external repo. This may conflict with idea#2 though.

This was my first thought as well.  However, after discussions with
Jesse, Seth, and James I was convinced otherwise.  The yum-priorities
plugin seems very unpopular with yum developers (not quite sure why).  I
don't think yum-priorities would give us any way to completely block a
package from local and remote repos, and configuring multiple repos in
the mock config would require Koji to retrieve and parse each remote
repodata to determine the origin of a given remote rpm.

The repomerge tool seems like it solves the problem better, and would be
more useful in general.