Fedora Extras Development Build Report

Sun May 8 16:48:18 UTC 2005

On Sun, 08 May 2005 15:11:58 +0100, David Woodhouse wrote:

> There will _always_ be some bugs which don't happen to show up for the
> maintainer, even with the most rigorous test regime.
> 
> Yes, the fact that maintainers often test on only one platform instead
> of all seven is a part of that -- but even if maintainers have
> sufficient hardware, time and inclination to repeat whatever testing
> they do on all of the available platforms, there'd still be plenty of
> bugs which slipped through.

This describes a scenario which is very different from the one I refer to:
publishing completely untested builds, which fail badly and make Fedora
Extras look bad. E.g. we've had immediate crashes with a few packages on
x86_64. In more than one case we offered something, which didn't work at
all. That's a worst-case scenario IMO. And the packager could not be
blamed, because he had not asked for a release on x86_64.

> > It is really bad if a package, which is believed to be ready, fails to
> > build on an untested platform and blocks the release of an update.
> > Rest assured, this is a turn-off criterion for many volunteers.
> 
> I can see _spurious_ build failures being a turn-off, and I'm a little
> concerned by the ones I've seen in the last day or so. But build
> failures which show _real_ problems are what the volunteer has
> volunteered for, surely? Dealing with that is the rôle of the package
> maintainer.

Example: when I mentioned the failed rebuild of gnupg2, which was reported
to fail in its tests on PPC, you suggested that maybe a bug in the
compiler was fixed meanwhile. When the same package builds fine on i386
(which must be checked, most likely with ExcludeArch, because the build
system would not build for i386 if it failed for the archs built earlier),
this is an indication of side-effects, or unportable code, which causes
run-time misbehaviour. Perhaps due to different endianess, abuse of
C-style casts or other things coders do, but maybe because something else
in the PPC environment is broken. Effectively, the volunteer packager is
expected to expand his interest and care for development and support of a
platform, which he doesn't use, and probably related specific
mailing-lists and topics of discussion. In case he's not, he could just
add ExcludeArch, file an arch-specific bug report, and move on [and
hopefully have more time to keep contact with upstream activity, do
testing and release well-selected quality updates, which please the user
community]. Arch-specific fixes could be requested and contributed
upstream.

> > It is bad enough already, that i386 is not built first, so packagers get
> > told that a build failed on another platform, and they don't know whether
> > i386 would succeed in the build system. This is a wrong decision IMO.
> 
> I'm not sure I see the logic. For any given version of a package, there
> may be any number of bugs which prevent it from building. The package
> maintainer needs to look at each one individually and fix it before
> moving on to the next. 
> 
> Any given pass through the build system will only show up one such error
> before it bails out -- why does the ordering really matter?

If you usually prepare and test your package on i386, you would start
debugging and fixing problems on i386 when the build system reports an
unexpected build failure for that arch. If the build request stops with an
error on x86_64, would you start debugging on i386 only to find that a)
you can't seem to reproduce the failure, b) you would need to re-run
"make" on an x86_64 machine more than a dozen times to locate e.g.
type-specific compilation errors in many places which don't show up on
i386, or c) you are stuck in a loop of failed builds, because a
contributed fix for one arch breaks a different arch unexpectedly.

The information that not just your private build on i386, but also the
i386 build in the official build system, did succeed, would be a helpful
time saver. That would result in a known good state like "i386 built fine,
x86_64 failed" as opposed to "i386 unknown, x86_64 failed".

> I'm not saying that ExcludeArch should necessarily be criticised; I'm
> saying that it should be used sparingly, and only with reference to a
> bug report which explains what the actual problem was.

+1

> For example, I _would_ have been grumpy if gnupg2 had ended up with an
> unexplained 'ExcludeArch: ppc' just because of the spurious build
> failure which seems to happen on x86_64 now too.

It would not surprise me if somewhere deep in the code there's a race
condition similar to the one I tracked down and avoided in gpgme (upstream
acknowledged the problem, but a real fix would require a bit of a
rewrite/redesign).