[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: closing older bug reports without looking = bad practice ?



Preface:  My response is rather verbose, because I believe this is a
very important issue, and is one I've put a tremendous amount of thought
and effort into for the last 2-4 years, so I have a lot of ideas and
opinions to express.  Those who dislike long emails, or aren't
interested in the topic of bug triage, are encouraged to stop reading
right now.  ;o)



Marius Andreiana wrote:
Hi all,

I've got a bunch of notices with FC5 test1 and test2 bugs being closed
as there have been a lot of updates meanwhile. For me, as a tester which
monitors some of the bugs and find some time once in a weekend to do
some triage, this seemed plain rude.

I agree.


If there are too many bugs overwhelming developers then the cause should
be addressed (better code with fewer bugs, which means more resources
devoted to coding which are unavailable now) or find alternative
solutions, where fedora testers could help.

Better code with fewer bugs is somewhat of a pipe dream.  In particular
since the software is written by various random people scattered around
the globe who can't be forced by the Fedora Project to write better
code.  Adding more engineers to the problem would at least in theory
fix more bugs, but would also likely introduce some new bugs that were
not previously there before.  IMHO at least, that wouldn't significantly
impact the bug-freeness of most things anyway.  If you have 500 bugs in
something and 10 people spend a lot of time developing/fixing it, and
you add 5 more people, and that lowers the overall bug count down to
485 bugs, while you've improved the software, it is still quite buggy.

Adding another 100 engineers to every project is not financially viable,
and wouldn't solve the problem anyway, as adding more people to solve
problems reaches a certain point in which adding one more person
actually makes things worse.  Brooks law.

The right approach I believe, is to very clearly define the real core
problems that need to be solved, get a laundry list, and then seriously
brainstorm other ideas openly as to how to solve the laundry list
problems.

While brainstorming however, one needs to be very very careful that they
are not just solving the bullet-list of problems, but also avoiding
creating NEW problems which are WORSE in the process (like the
autoclosing that just happened.)


Closing all the reported bugs won't make them go away, but will make testers
> go away. With the current bugzilla setup, the number of reported bugs would have
> decreased as more testers do triage and see if they are still valid in
> rawhide, or they could confirm it until a developer finally looks at it, or the
reporter might update it with new info.

At a bare minimum, such mass bug updates shouldn't be blindly applied to
groups of bugs, but should be done by reviewing each bug individually.

Reviewing a large number of bugs and earmarking them (on paper, or in a
text file, or on a bugzilla bug tracker, or some other mechanism) for
later mass-update is a way of categorizing smartly, however it does take
more human time and effort.  Then, a mass update could be applied to all
of the bugs that were earmarked.  Then the comment added actually makes
sense, and is in context of the current state of the bugs that get
flagged.


A practice from openoffice bugzilla could be borrowed - bugs reported by
usual bugzilla acounts are by default UNCONFIRMED, and only fedora
testers can set them to NEW. If there are too many bugs, a developer
could ignore UNCONFIRMED ones.

I don't really like the idea of an UNCONFIRMED state personally, at
least as far as X related bug reports are concerned.  Sometimes people
report bugs, and someone else has the same problem as them or *THINKS*
they do, and could in theory "confirm" it.  Later the "bug" turns out
to be 5 people all misconfigured the same thing, or simply misunderstand
how it is supposed to work.  That's one example.

The current text comments indicate who has confirmed what, etc. anyway
and in a more meaningful way.  And for X bugs, I would say the majority
are "UNCONFIRMED" for most of their life from engineering side at least,
due to lack of hardware and various numerous other factors.

The last thing we really need is more bugzilla bug states/resolutions.
It is a huge morass of confusing states as it is right now, and if you
ask 20 people what each state really means, you'll get 20 different
answers for many of the states.  Bug state usage varies heavily from
package to package, and from engineer to engineer.  That's just a
fact of current and past usage that can't be ignored.

One could put forth the idea of "Well let's make some policies to
officially define exactly what each bug state/resolution means, and
document them on a Wiki, and declare it official policy.  Then after
that every single engineer will be ruled with an iron fist to use
the bug states 100% consistently."

Actually, some people have put that idea forth.  The problem is, it
is like preventing people from smoking pot by making possession of
pot illegal.  It doesn't stop anything.  No matter what documentation
and/or policies might be created for bugzilla, it will have some
net positive effect perhaps, but many people will likely just ignore
a lot of it due to information-overflow and the need to get real
work done.  In the end, people's job priorities win over reciting
and adhering a bugzilla manual from heart like a national anthem.


I also noticed Dave closes kernel bugs with each kernel release, asking
people to retest. While this might save developers time, it could also
leave real bugs closed as reporter might get tired of re-confirming a
bug for 3 times.

That is indeed true, however there is really no perfect solution to
the bug triage problem, and it is largely left in the hands of the
individual package maintainers to determine how best to manage the
bugs in the components that they own.  For smaller packages it is
much less of a problem.  For huge critical packages like the kernel,
X, rpm, anaconda, glibc, gcc, gnome, kde, and others, there are so
many bugs that it is just insane bug pileup that is impossible to
know what is still relevant and what is not.  And the more bugs there
are piled up, the more difficult it is to find the ones that are
actually still relevent and prioritize them effectively.

For example, you have 700 bugs open in your packages total.  Today it
is your task to prioritize them.  The only problem is that it will take
you 3 weeks simply to read/review each bug to even know what all is
there, and to start to build up a list of bugs which are more serious
and deserve more attention.  While you are doing that, 72 new bugs have
been filed.  All of this time you have done nothing but read/triage
and update bug reports, trying to categorize them in some sane manner
to work on them _later in the future_.

It's just totally an unmanageable mountain of infinite problems for
all intents and purposes.  Yet, you still have to wade through at least
a _fraction_ of all of the problems, and flag them as being priorities,
and put them in your work queue.  Then you need to actually _work_ on
that work queue.  While working on actually fixing bugs, the existing
bugs continue to rot, and pile up while 50 more come in.

Also, for the kernel and X at least, add to that the fact that you
don't have every network card, video card, motherboard, laptop, mouse,
KVM switch, monitor, DFP, sound card, etc. available for you to
even attempt to reproduce the problem being reported, and there is a
whole new class of bugs that pile up.  One attempt at helping to solve
this, is for the engineer to ask the person to please report their
bug to the upstream project, as that increases the chances that someone
who actually has the hardware available, might also have the time
available to look into the problem, and actually fix it.  Sometimes
that works great.  Other times, the bug reporter whines that they
have done their job reporting it to Red Hat, and don't want to be
bothered opening 400 bugzilla accounts around the globe.  Ok, no
problem...  your bug is now 1 bug in 750, of which maybe 50 get
fixed in N months or whatever.

Your bug might sit there for a year, or two simply due to insufficient
hardware or manpower, or perhaps it is just very low priority when compared to 50 other higher more critical issues that are also sitting
rotting.

Since finite humans can only solve finite numbers of problems in
finite amount of time, and software flaws are seemingly infinite,
the bugs will continue to pile up and pile up forever essentially.

The only way to prevent it as I personally see it, is to have a bug
triaging "method" which makes concrete decisions at regular intervals
throughout a bug's life.  I've talked about this before a year or two
ago on various lists when we implemented such a system for our team's
bug triaging.  I wont get into the details of that system right now, or
this already long mail will be a textbook.  ;o)



Take for example this bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=174968
In FC5test1, looks like the bttv driver had some problems, causing
tvtuners to stop working. A few updates later, it worked again. The
causes for not working and working again are unknown, therefore it could
happen on every new release. Just cross fingers and hope for the best!

Yep, and unfortunately that's the way it is for a lot of hardware
class bugs.  I'll just make up a completely fabricated example to
illustrate my point.  Let's say you have a Trident video card, or
perhaps a Siliconmotion video in a laptop.  You install FC5test3
on it, and video doesn't work properly.

Well, to the best of my knowledge, none of our X engineers including
myself have any Trident video hardware whatsoever, nor siliconmotion.
No trident or siliconmotion hardware documentation.  Just driver
source code, and no way to attempt to reproduce the problem.  The
only way to diagnose it usually is to read the symptoms and gather
more information such as logs/configs.  Then review all the info
that's been accumulated and form a theory as to what the problem might
be based on the info provided.  Then when possible provide suggestions
of things to try to narrow down the problem and perhaps find a
temporary workaround.

Quite often, such bugs will never get fixed until the upstream X.Org
driver maintainer who _does_ have the hardware, and _does_ have the
documentation is aware of the issue, and provides a fix or test-fix
in CVS or in an upstream bug report.

So the reality of things is that some bugs will get fixed, some will
get workarounds, some will get ugly hacks that allow the person to
be able to use their system perhaps with degraded functionality or
performance until a real fix is available upstream.  Other problems
will sit and rot due to being very obscure, hard to reproduce,
very transient and require specific hardware combinations, or require
hardware that isn't available.  Other bugs are just very very low
priority compared to the other 500 bugs that are open in the grand
scheme of things, and so we know right away there's no way we'll ever
have a chance of spending time on it.  File a bug in our bugzilla
about the ark logic driver for example and I can pretty much guarantee
that it will either sit there forever, or I'll close it "please file
upstream".

"Please file upstream" is a nice polite way of saying "I want your
problem to be solved too, but being completely realistic, if I don't
tell you to file it upstream right now, the likelyhood of it sitting
in our bugzilla untouched and unfixed for 3 years is about 1000 times
more likely than if it is filed upstream, simply due to manpower
and resource constraints."

I realize this is a _LOT_ more information that some people reading this
may want to read, and I'm sure that some people will probably be upset
to read about the cold hard reality that it is simply impossible for
every single bug to get attention.  Everyone wants their problems to
be looked at at least, and it's nice if you can do it in a one stop
shopping manner through your distro's bugzilla.  Hell, even *I* want
that!!  Don't we all!

Reality kicks in though, and you either end up going from 500 bugs
to 800 to 2000 to 3000 and at some point you get zero actual work done
because you spend all your time reading bug reports and no actual time
fixing bugs.  Or you devise a scheme for reducing the bug load by
telling people to upstream things, closing bugs to which the reporter(s)
have not been responsive for N days/weeks/months, closing bugs as
rawhide that you have good reason to believe are fixed in the latest
bits - with a request to reopen if the problem still exists, etc.

Many teams or individuals have a system of some sort for handling that.
Others might just let those bugs pile up for a long time, then gang
close many at once.  No matter how it is done, some people will no
doubt be inconvienienced, and it is probably not possible to avoid
that completely.




To conclude, please don't close bug reports without researching them
first.

In general I agree with that.  I dislike the idea of an auto-close
after n days/weeks/months of bugs of category foo, or release bar, or
test release bar, etc...  as there will always be bugs closed and lost
forever that nobody will even notice got closed, but which were
definitely not fixed and shouldn't have been closed.

If Red Hat staff doesn't have time for it, please ask for triage
help on fedora-test list, even with weekly themes, such as GNOME triage
week (including searching for upstream bugs), kernel week etc - just
post the link to the bugzilla query for all the required components.

Agreed.  A bug day/days/week/month/whatever is a better way to approach
it IMHO for many cases.

Thanks and looking forward for a better FC5 release!

Indeed.


--
Mike A. Harris  *  Open Source Advocate  *  http://mharris.ca
                      Proud Canadian.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]