thoughts about secondary architectures

Lennert Buytenhek buytenh at wantstofly.org
Tue Jun 12 19:29:28 UTC 2007


I think we discussed most of this in IRC, but let's continue the
thread in email anyway, for the sake of having the discussion
archived and giving others the chance to jump in.


On Sat, Jun 02, 2007 at 11:45:06AM +0100, David Woodhouse wrote:

> > My view is that it's clear that most of the people hacking on Fedora
> > and using Fedora care only about x86/x86_64 systems, and that I (and
> > the other people who are interested in secondary architectures)
> > should try as much as possible to avoid making the lives of the x86
> > people difficult, if we ever want to have a chance of getting our
> > patches merged without pissing everyone else off.
> 
> I think you do us a disservice here.

Sorry. :-(


> I hope that getting sensible patches merged should always be easy,
> regardless of how the build system works. There aren't many of us
> who'll start refusing your patches just because your build system
> pissed us off, hopefully :)

They might not refuse the patches, but they might delay them... :)


> We're trying to make it easier for people to build Fedora for new
> architectures. As you've so capably demonstrated, it's already
> _possible_ to do that -- but we want to make it less painful, and in
> particular we want to help you keep in sync with Fedora during the
> development cycle. (At least, that's what I _think_ we're trying to
> achieve -- Spot's document itself lacked any explicit rationale.)

Getting hooked up into the upstream build system as early as
possible is something that I would really like, and what seems like
might make my life easier.

But, right now, we have a mostly complete FC6 port, no F7 port, and
no F8 port.  At this point, it doesn't seem like an option to me to
get ARM hooked up into koji, and it probably won't be until we get
in sync with most of the devel effort.


> Yes, we want to keep the impact on the package maintainers minimal.
> But that doesn't mean it has to be entirely zero. If we want _zero_
> impact, then you might just as well keep building it for yourself
> as you already are.

>From personal experience, what I tend to see is that there is a lot
of reluctance to merge things to benefit ARM or to accomodate ARM,
as ARM is seen as a minority architecture, even if there are probably
more ARM CPUs in the world running Linux than anything else.

>From that point of view, I'd be inclined to say that I'll take any
additional effort required for having an official Fedora ARM port
upon myself, and others who work on the ARM port an are interested
in the ARM port.

Of course, if you don't agree with this division of labor, I'm not
going to complain about it. :)


> > While it is very well possible that there is some bug in a package
> > that does not surface on x86, 99.9% of the Fedora developers are
> > unlikely to care about that 
> 
> (Actually I think are many more than 1 in 1000 who are more
> conscientious than that and actually do care about portability. We're
> not _that_ lackadaisical, as a rule.)

Well, I assumed that most Fedora developers probably only develop
on x86/x86_64 and thus care most about those architectures, I didn't
mean to say that they wouldn't care about portability in general.


> > if the package builds OK on x86 and no ill effects are seen on x86.
> 
> Nevertheless, in the case where a package _used_ to work on ARM and the
> updated version suddenly doesn't build, don't you think that warrants at
> least a _glance_ from the package maintainer to see if it's actually a
> _generic_ issue which just happens to bite on ARM today for some
> timing-related or other not 100% repeatable reason?

Sure, I'd agree with you on this point, I'm just saying that there is
some kind of line that should be drawn.  Would you consider the fact
that some package depends on being run on a 2's complement architecture
a release blocker?  From the "What is every sensible architecture out
there doing?", I probably wouldn't...  Of course, if it did build
before on that 1's complement machine, it might be a different
issue.


> That glance is _all_ that should be required -- package maintainers
> should _definitely_ have the option of just pushing a button to say they
> don't care, and shipping the package anyway on all the architectures for
> which it _did_ build. We don't want to make life hard for them; you're
> right. But we don't necessarily want them to ignore failures which could
> show up a real problem, either.
> 
> We could see the builds on other architectures as free testing. They
> often _do_ show up issues which are generic, and not just arch-specific.
> Especially in the cases where the package in question _used_ to build OK
> on that architecture -- which is all we'd be expecting the package
> maintainers to notice in the general case.

Totally agreed, no argument here.  That 'ship it anyway' button is
probably something that needs some more thought.

Effectively, if we allow someone to press the 'ship it anyway' button,
we allow them to decide that the architectures that didn't build will
now be out of sync.  For things like Jack's Virtual Pipe Organ for X
Windows, this probably isn't a big issue, but for the more fundamental
packages (gcc, python, any critical libraries), it seems that it would.

>From the point that the 'ship it anyway' button is pressed, some
archs will be out of sync, and we really don't want to build, say,
some python module against python 2.8 on some archs and against
python 2.9 on some other archs.

Should we track the exact build dependencies used on the 'primary
architectures' and bail out a secondary arch build if the same
versions of the same packages are not available?  Or is there another
way to enforce these dependencies that works better?

In any case, to me it seems like that this is an issue that you have
to deal with in some way in any case once you provide people with a
'ship it anyway' button.


> > From a purely technical point of view I would advocate that a build
> > failure on any architecture fails the package build, but there will
> > only have to be 3 or 4 cases where some gcc ICE causes some package
> > to fail to build on some secondary architecture but build fine on
> > x86 and the x86 people will hate us forever afterwards, and will
> > eventually start clamoring that getting all these secondary
> > architectures on board was a bad idea to begin with.  (Which, of
> > course, would be totally understandable at that point.)
> 
> I don't think it would be understandable at all. It's not as if it
> would take them long to glance at the failure and click the "don't
> care" button. (Well, OK, we'd want a bug filed, but there can be
> automated assistance with templates for that, even though it's a
> bad idea to have it all done _completely_ automatically).

Well, if you've never dealt with architectures where chars are
unsigned by default, where unaligned accesses don't work as you
would expect, where sizeof(struct { char a; char b; }) == 4, or
where 'double' values are in little endian byte order but in big
endian word order, I would think that it is understandable that
you're not extremely interested in bug reports for architectures
where that is in fact the case.

I'm not saying that it is not your job as package maintainer to
look at such failures, I'm just saying that such failures are not
likely to draw a lot of your attention.  Especially if you have
13562 other things to do.

And if enough of these 'unexplainable' issues occur, you're more
likely to just press the 'ship it anyway' button anyway.


> If we think GCC is going to be unstable on some platforms, then perhaps
> the 'complete rebuild in mock' process which Matt Domsch has been doing
> should be made mandatory? That would generally help to catch such
> compiler-related problems before they affect package maintainers.

(Tangent: I've wondered for a while whether the 'rebuild all packages
in F(C)-foo using packages in F(C)-foo' is something that is supposed
to work.  If it is, then I wonder why I should try to create the
exact same chroot build environments (in terms of packages and
package versions) on ARM as was used on primary architectures when
some certain package was built?)


> > There is a similar issue with build speed.  While my fastest ARM box
> > (800 MHz with 512K L2 cache) is quite snappy as far as ARM systems go,
> > it is probably no match for even the crappiest of x86 boxes.  The
> > fastest ARM CPU I know of is a dual core 1.2GHz, which is still no
> > match for x86.
> > 
> > This doesn't mean, IMHO, that it makes no sense to run Fedora on ARM
> > systems.
> > 
> > But it does mean that if the building of packages on primary
> > architectures is throttled at the speed of building packages on ARM,
> > we're going to make a lot of (x86) Fedora developers very sad, angry,
> > frustrated, or all of the above.
> 
> Not really. The builds for the architecture they _care_ about would be
> available in koji from the moment they finish, and with the
> 'chain-build' option they'd be able to build subsequent dependent
> packages immediately too. The only thing that would be waiting is the
> push to the main repository... which also in practice waits for mirrors
> to sync and other stuff like that. It's hardly a fast-path.

(How is chain-build different from
"BuildRequires: foo >= bar.baz.quux"?  The latter is at least
easily enforced.)


> If there are architectures which are _really_ slow, and it really does
> start to cause problems, then _perhaps_ we'd need to stop waiting for
> those architectures. 
> 
> I think we should try to avoid that unless it's really necessary though.
> Not only would we be pushing partially-failed packages without any
> investigation, but you'd also start getting build failures and
> inconsistencies on that architecture even when there wasn't actually a
> real problem -- if a developer doesn't use chain-build but just submits
> jobs one after the other, before the first job finished on all
> architectures. The repository for that architecture wouldn't really be
> in sync with Fedora at all -- you haven't gained much over the current
> situation where you're entirely on your own.

(The 'asking people to use chain-build' is another case (IMHO) of
'putting some burden on the arch maintainer' vs 'putting some burden
on the package maintainer'.)


> Perhaps one way to deal with this potential problem is to allow the
> package maintainer to push the 'go ahead and push it anyway' button in
> the build system even _before_ the build has run to completion on every
> architecture?

That seems fine to me, too, but at that point, aren't you back to
'keep your arch in sync separate from the primary archs' anyway?

I.e. let's say that I track upstream koji build dependencies in my
downstream arch build system.  (I.e. I make sure that I only build
packages with the same versions of build-dependent packages as on
x86.)  Aren't we back to 'my arch is supported separately' at that
point?

Sure, I can send $packagemaintainer an email whenever his/her
package fails to build in my arch build system even though it built
fine on x86, but it's not quite the same thing.


> That way, the build would _normally_ wait for everyone to finish
> and the repositories would remain in sync, and potential bugs would
> get at least a cursory glance before the package is shipped -- but
> in the fairly rare case where there's an urgent need for it in the
> actual repositories, the package maintainer could speed things up.
> 
> (Presumably they'd need to have a way to force the mirrors to sync
> up immediately too, if they're in this much of a rush? Something
> which has never been brought up as an issue before, AIUI.)

I can't sensibly comment on that..


> > So, IMHO, ideally, the existence of secondary architectures should
> > not significantly affect the typical workflow of an x86 Fedora
> > developer, and secondary architectures should not negatively affect
> > development on x86.
> 
> This is true, but taken to extremes it means we may as well not bother
> trying to make life easier for you at all.

Well, considering what I've said above, I'm not really asking the
Fedora project to make my life easier at all.  If they'd merge my
patches, that'd be fine with me.  Even if that means extra work for
me.

I guess I'm mostly talking from the point of view of an architecture
for which there has been a "Fedora" port available since the Red Hat
7.3 days, but has always lived out-of-tree.  Any change to that
situation would be welcome, really.  Even if they'd only take the
patches but not make arm a primary or secondary arch.


> I think the whole point of the proposal is that there _are_ things we
> can do, which are simple enough for us, which will help you a lot.
> Should we refuse even to lift a finger to merge your patches, just
> because we're too lazy?

I know what my opinion is, but ultimately, it's not up to me.. :)

(I.e. ideally I'd just retire to some desert island while the Fedora
ARM point maintains itself.)


> Despite the fact that you then have to work "two or three times as
> hard" because you have to work around our recalcitrance?

I don't think it's necessarily recalcitrance on the Fedora side, but
more a not really seeing the point of having to spend effort on
something they don't care about?


> If so, then why are we bothering with this proposal at
> all? You might as well just keep doing it on your own, surely?

That's what we've been doing for a while. :)  (RH73, RH9, FC2,
FC3, FC4, FC6.)  We tried to merge our patches at various points
in the past, which never turned out successful.

If people feel that ARM being supported separately is the best way
forward for the ARM port, sure, I'll accept that.  I just hope that
there's another way that can be found that is both:
1/ minimal impact for the existing Fedora package maintainers; and
2/ less hassle for us.

If that means that I have to work harder, I'll do that.


> I think that if we're going to bother doing _anything_, the least we
> should do is merge your patches and keep the build system in sync in the
> _default_ case.

Again, I don't disagree here at all.


> Yes, package maintainers should have the option not to
> care about builds which fail on ARM -- but any competent maintainer
> should at least be taking a _cursory_ look at any new failure.

Nor do I disagree with this.


> If we _really_ have to, we could have an option not to wait for the
> build to complete -- but using that should be discouraged except in very
> special cases. As I said, it's not as if packages making it to the
> mirrors is a fast path.

ACK.


cheers,
Lennert




More information about the fedora-devel-list mailing list