abrt + X Error => zillions of duplicate bug reports?

Wed Nov 25 17:54:15 UTC 2009

On Wed, 2009-11-25 at 10:20 +0100, Karel Klic wrote:

> > We came up with several possible courses of action. First, we
> > acknowledge that abrt team is working on improving duplicate detection,
> > but Matej noted that this is intrinsically hard work and abrt will
> > likely never be able to eliminate or even come close to eliminating
> > duplicate reporting.
> 
> The algorithm for duplicate detection in the currently released version 
> of ABRT is very rudimentary: it removes only the most obvious duplicates 
> in simple programs. As far as I know it does not work for applications 
> with variable number of threads (e.g. Firefox).
> 
> Fortunately now we have a new algorithm for duplicate detection which 
> handles all the cases in a significantly better way. Most of the code is 
> written, but it needs some testing before releasing. I guess it will 
> take two weeks or so to finish it, and to make sure it works well.
> 
> An important attribute of the new algorithm is that it errs on the side 
> of false duplicates. So it will much more often say some bug is a 
> duplicate of another bug, even if sometimes it is not the case. It 
> should make abrt bug flow sustainable, and than we can slowly improve 
> the detection mechanism to be more accurate.

> > Second, we wondered if abrt team might be able to assist in running any
> > improved duplicate detection mechanisms over already-reported bugs in
> > Bugzilla retrospectively. We will follow up with them about that.

> When the duplicate detection works, it would be a loss to not have the 
> crashes directly in Bugzilla. I often see that the crashes reported by 
> ABRT are located in the code and fixed.
> 
> If we fail to deliver better detection, then some intermediate site is 
> certainly better target for thousands of duplicates than Bugzilla.
> 
> I would propose to create some intermediate site as a target for users 
> who are not experienced enough to create an account in Bugzilla and to 
> respond to questions, or they simply do not care. Then, it would be 
> possible for them to report almost automatically, and we could get a lot 
> of backtraces and support data that is currently lost. However, this 
> must be thought out (security issue with backtraces).

Thanks, Karel. If you think you'll be able to manage a level of
duplicate detection which keeps things workable for triaging, that's
great news and would certainly make the whole situation simpler :). I
think we'd be willing to wait on that and see how it goes. I was working
from Matej's suggestion in the meeting that he'd talked to you (abrt
team) already and was sure that a high enough level of duplicate
detection was hard/impossible; if that's not the case, obviously it
changes things.

What do you think of the idea of running the improved duplicate
detection logic over existing abrt bug reports in Bugzilla? Would that
be feasible, perhaps with some help from the Bugzilla maintainer?
Thanks!

-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Fedora Talk: adamwill AT fedoraproject DOT org
http://www.happyassassin.net