Heads-up: brand new RPM version about to hit rawhide

Sat Jul 12 02:48:01 UTC 2008

On Fri, 2008-07-11 at 17:37 -0700, Toshio Kuratomi wrote:
> Firstly: what is your overall idea?  Is it exploded trees as jcollie and 
> I were arguing for at one time or is it mirroring of upstream repos onto 
> Fedora servers?

It's both.  You first have to support exploded source repos to make the
rest of this worth anything.  However, part of *truly* supporting an
exploded source repo is making that repo available INSTEAD OF srpms.  In
other words, fulfill our legal obligation to provide source for a
package via the source repo instead of via an srpm.

Once you take that first step, then the next part is integrating our own
development processes with those from upstream where ever we can (this
means any upstream project that uses a distributed SCM, and possibly
also subversion, but you might as well write off CVS).  But you can't
call it mirroring, because that implies that our source repo is just a
copy of upstream's and that wouldn't be the case.  Instead, we would
*follow* their repo commits, on branches that belong to them and we
don't touch, and we would do our own work on our own branches and merge
from their branches to our branches when appropriate (this sounds more
complex than it is really...most packages we are just going to take
whatever upstream does wholesale into our own branch, it's only a few
packages that will really make use of this capability).  If upstream
happens to be on any decent distributed SCM, then this becomes an almost
dreamy operation for maintainers (compared to now anyway).

And if upstream doesn't use an SCM worth a crap, there's nothing to say
we can't explode out a package anyway.  It all depends on how much work
we are going to do on that package to determine if it's worth the
effort, or if the ability to merge things between stable branches
matters a lot.  Obviously, some packages we just run make and throw them
over the wall.  Other packages we do more.  It's the packages where we
do a lot that it really makes a difference.

>   Go into more depth about the specifics of what you've 
> thought out otherwise people don't know what the issues and solutions 
> are going to be.

The first issue is simply supporting an exploded source repo.  An
exploded source repo really only requires a few things.  First, you no
longer need a %setup or %patch portion in the spec file.  Second, you
treat things differently in that sourcedir, specdir, and builddir are
all one and the same.  Finally, since you built the binary packages from
this exploded source repo, then in order to give people the exact
sources you built from, you need to make the repo available for
clone/checkout by people.  You need never once build an srpm or tarball
from this repo if you don't want to (and in fact, an srpm wouldn't build
from the same spec file as an exploded source repo spec file unless you
conditionalized the spec to know if it was in an srpm or in its native
exploded source repo format).  Other than that, the other issues related
to the repo: access controls, making sure anything and everything built
from the repo through the build system corresponds to a tag in the repo,
that other standard policies are followed; these things are all srpm
style versus exploded repo agnostic (implementation details differ, but
the basic policies are the same).  In a nutshell this sums up what's
needed to support an exploded source repo (obviously, the RPM headers I
mentioned earlier in this thread are for tracking what tag you build a
package from, which you *must* do since you no longer produce an srpm
unless needed for some other reason, and there are support details
needed in the build system, but all of these are surprisingly simple
things to take care of, it's more getting people used to the idea of not
having an srpm in their hands that's the single biggest hurdle to an
exploded source repo).

Once you support and exploded source repo, and support a repo as the
canonical source distribution mechanism, then the first advantage to
this type of setup is that every SCM worth a pile of dog poo will store
the different versions of software in some form of change related format
that keeps you from duplicating the same things over and over again like
tarball after tarball does.  You generally take a hit in size versus a
single tarball, but end up saving quite a lot in the long run.  And the
more efficient the system is at branching, the better this gets.  And
you generally don't have to worry about cleaning out caches and crap
like that just because removing a single version isn't really possible,
and wouldn't save you anything even if it was possible.  Just about any
package can benefit from this over time.

Next, you get to work on the code in native format, try things out, run
build tests, and all the while the pain of repetitive rpm source
processing is reduced (sometimes it sucks, sometimes not, depends on the
package).  But, it's certainly much easier to do a bunch of work, build,
oops it didn't build, edit again, build again, finally builds, test,
oops it breaks, edit again, build again, it works this time, great time
to check in: viola, you no longer have to worry about "gee, I forgot to
create a backup copy of file blah so when I ran gendiff <dir>
<backup-suffix> it missed that file from the patch and I lost it when I
went to test the rpm build" or any crap like that, you just get to check
in your changes with a nice changelog that describes your wonderful
work.

Those two things along really are enough to justify the exploded source
repo concept by themselves.  But, they aren't all by a long shot.  This
is all irregardless of upstream.  What happens when upstream runs a
reasonable distributed SCM too?  Let's look at an example.

This is where I point out that Jesse's email I responded to about the
upstream RPM devel cluttering up fedora's devel branch, the one where I
said he wasn't imaginative at all in terms of branching, is a perfect
example.  Panu mentioned he was pulling the new rpm from the upstream
git repo.  We would simply clone that.  In the process, our official
repo would have a list of references to the remote, upstream repo's
branches.  These branches are inviolate by us.  We can never change
them, they simply are a copy of upstream's metadata.  We can, however,
create our own branches.  In fact, the standard modus operandi in a case
like this would be to clone upstream, then create tracking branches in
our repo that show us upstreams branches (because we don't see anything
but master from upstream by default), then create our own branches (so
upstream has it's own devel branch, usually just named master, and we
could create our own branch named fedora-devel that would be our primary
devel branch, then as we approach a release we can branch from
fedora-devel to f-8, f-9, etc), and then we simply merge or don't merge
from upstream to our devel branch as we see fit.  For things where we
want to follow upstream, we can actually configure fedora-devel to
automatically merge any new changes from upstream's master branch in
anytime we do a pull (in fact, you can do this on a per branch basis,
any given branch can be told to automatically merge changes from another
branch into it, or it can be a more static branch that doesn't auto
merge anything).  Had this been the case, then merely setting the
fedora-devel branch to not automerge from the remote (upstream) devel
branch would have resulted in all of the auto-rebuilds and things like
that working just fine on the fedora-devel branch as Jesse mentioned
needed to happen, but it would have let us see the changes going on in
the remote tracking branches and everyone who bothered to update their
rpm repo would see those changes on those remote branches and know
something was up.

And it gets even better.  Let's say I decided to put mdadm under git
management (as it turns out, upstream also uses git on mdadm).  And
let's say I have some patches I would like upstream to consider.  Since
I'm now working on exploded source, those patches aren't patches any
more, they're change sets.  I can actually create a temporary branch
that is a clone of upstream's master, and then I can hand pick the
patches I would like to go upstream and merge those change sets to this
temporary branch, then I can send upstream an email that says:

Hey Neil, I have some stuff I would like you to include in mdadm.
Here's the change logs from the change sets:

<change 1>
<change 2>
etc.

 please pull from git://git.fedorahosted.org/mdadm for-neil

He can then pull those into his master, and the next time I pull from
his tree and merge it into my tree, my git will see that he picked up my
changes and not try to merge them onto branches where they already
exist, making the whole "UPSTREAM UPSTREAM UPSTREAM" mantra of Fedora
that much easier to implement.

Really, there are all sorts of reasons to use exploded source repos, to
join our own development efforts in with upstream and to hook our source
systems together.  In the end though, it all boils down to this.  Some
people are comfortable with and want to keep using srpms and our current
disconnected SCM methodology, and some people want another choice.  I'm
perfectly fine with other people not wanting to change.  They don't have
to.  But I would prefer to be granted the ability to modernize my own
way of working should I choose to do so.  And this is a big part of
that.

This has been more of a sales pitch than anything to be honest.  If you
want to know more about what I had in mind for nuts and bolts changes to
rpm, then I'm attaching a tar.gz of my ~/.tomboy directory.  As I was
working on things, I just made notes (I really like Tomboy now).  Move
your own .tomboy out of the way if you have anything you'd like to save,
then unpack mine in place, restart tomboy, and start reading from the
Enabling optimal SCM usage in Fedora.  Everything is linked from that
one note.  Of course, I was really only a little ways in.  I was still
concentrating on the rpm changes and hadn't touched on build system
changes, or repo server changes, or access controls with different scms,
or any of that stuff.  And what I *had* accomplished in terms of rpm
knowledge is now at least somewhat wrong given the rpm update.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tomboy.tgz
Type: application/x-compressed-tar
Size: 18921 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20080711/0be43040/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20080711/0be43040/attachment.sig>