Backtrace deduplication server
Backtrace deduplication server solves the problem of many duplicate
crash reports being submitted by ABRT to Red Hat Bugzilla. It is
designed to help ABRT users to find duplicate reports before filing
a new bug, and to help package maintainers to triage/reassign/merge
already reported bugs.
Backtrace deduplication server is a collection of newly-developed
tools that will be deployed on the retrace server hardware, which is
a part of Fedora infractructure. ABRT will contain a client tool and
integration with the server.
Reason for existence
Issues this project is addressing:
- Red Hat Bugzilla receives a lot of duplicate crash reports from
ABRT clients, even for a single component. This makes ABRT reports
less useful and causes developers to give ABRT reports lower
priority.
- Red Hat Bugzilla receives a lot of low-quality reports, which
should be closed without intervention from maintainers. For
example, the simple-scan component is very affected by low quality
of ABRT: many of
its bug
reports are duplicates, and some reports are incorrectly
showing
__libc_message and similar functions as crash
functions.
- Red Hat Bugzilla contains multiple crash reports filed on
end-user applications, that are caused by a single bug in a
library. The crash reports are then analyzed multiple times by
various developers, and that wastes their time.
Opportunities this project is pursuing:
- If Bugzilla will contain less duplicates of crash reports, and
crash reports will be assigned to correct component more
frequently, developers might recognize this and give ABRT bugs
higher priority, leading to more bugs fixed.
- In the case of implementing crash collection server, this
project implements the code and algorithms to perform efficient
backtrace comparsion and cleanup. This allows us to re-use this
code for hypothetic crash collection server.
Objectives
- Help users with the decision about where to report a bug, so
Bugzilla will receive significantly lower amount of duplicate
crash reports.
- Help bug triagers
- Close low-quality bugs that weren't supposed to be filed
at all.
- Crashes will get automatic comments about similar
(duplicate) opened bugs filed on other components (libraries,
and other programs using the same libraries), and proposing
some action (for example reassigning a bug to a library, and
closing other bugs as duplicates of that).
- For inactive/untouched bugs, this must happen
automatically.
- Help non-ABRT systems to search Red Hat Bugzilla for crash
reports related to provided backtrace.
Outcomes
- Implementation of backtrace metrics and indexes in
Btparser
- Damerau-Levenshtein distance
- Jaro-Winkler distance
- Implementation of backtrace optimization in Btparser
- Backtrace deduplication service for C/C++ backtraces, which
takes a backtrace and component, and checks backtraces from all
related components (of libraries used by the crashed binary) in
Bugzilla.
- name: faf-btserver-find-duplicates
- HTTP interface to the backtrace deduplication service,
implemented as a CGI script
- Crash report cleanup service, which merges crashes that are
already reported in Bugzilla. It also finds low quality reports
and duplicates and close/reassign them. The implementation must
consist of four scripts:
- faf-btserver-analyze-bugzilla
- The merging is done on a component level, where similar
bugs from the same component are merged, and also on a
cross-component level, where bugs from applications are
matched to those of their library dependencies, and bugs
in libraries are detected by searching duplicates between
components with shared dependencies.
- Achieve the right balance between application bug and
library bug blaming. For example, many applications are
crashing on a
strcmp call, but we can
reasonably assume there is no bug
in strcmp.
- Compute distances and similarity indices between a bug
(backtrace of bug) and all relevant bugs
- Compute backtrace quality
- Store the computed data in a bug report
- The number of crash combinations to check is
huge. Optimizations might be needed to limit checks to
backtraces having the same library calls on stack.
- faf-btserver-prepare-bugzilla-actions
- find similar bugs in the bug reports
- check bug statuses and generate a list of desired
actions to be performed on Bugzilla
- faf-btserver-push-actions-bugzilla
- Performs desired actions on Bugzilla
- If a bug that is filed on an application but belongs to
a library is detected, it will be either reassinged or a
comment will be added:
It appears that this bug should be moved to
component glib2. Other bugs from emacs (bug #644532) and
evolution (bugs #758654, #749564) are duplicates of this
bug. Please consider marking them as duplicates and
moving this bug to glib2.
- faf-btserver-actions-log - generate a log of desired actions
on Bugzilla in a text file; this is good for development,
tweaking, debugging
- Synchronization script to update server metadata — bugs,
backtraces, builds, RPMs
People
- Team: Karel Klíč, Jan Šmejda
- Interests: Jon McCann
- Affects: Andrew Hecox
- Send information to: Jirka Moskovčák, Radek Vokál
Timeline
- Project start date: 2011-05-17
- Planned finish date: 2012-04-30
Milestones
- 2011-12-21 Wed
- Current version is running on internal or external retrace
server. Responds to queries as expected. Synchronizes with
Bugzilla and Koji, even without being effective.
- 2012-01-11 Wed
- Feature is submitted to Fedora 17 Features.
- 2012-01-18 Wed
- Client-side ABRT plugin is finished and integrated into
ABRT.
- 2012-02-01 Wed
- Fedora feature freeze. Server synchronizes itself
effectively. Development done.
- 2012-04-30 Fri
- Blocker bugs are fixed. Server is capable of server ABRT
long-term.
Diary
- 2011-11-10
- Two subprojects defined in
faf: Crash
report cleanup service for Bugzilla
and Backtrace
deduplication server. Both planned to be finished
2012-04-30.
- 2011-11-02
- Started implementation of the crash report cleanup service for
Bugzilla. Data gathering part.
- 2011-05-17
- Meeting with Jan Smejda about possible bachelor thesis
topics. This project was selected.
Risks
Too heavy resource usage on server
Dependencies
The project depends on the retrace server hardware being available
for deplyment of this project.
ABRT must contain a
client using Backtrace deduplication server, and it must offer using
the server from the GUI.
Btparser receives our
implementation of backtrace metrics and indexes.
Faf receives the rest of the server
implementation, and provides the tools to synchronize with Koji and
Bugzilla.
Contingency plan
ABRT uses duplicate hashes to detect duplicates as usual. Without
the backtrace deduplication server, ABRT bugs are still filed on the
software component that owns the crashed binary. Duplicates within
single component can be closed by extending an existing script,
without having a server deployed.
Documentation
Solving the issues by a centralized server (as opposed to
client-side backtrace analysis) provides an advantage by collecting,
managing, and pre-processing of bulky crash-related data from
various sources — bugs and attachments from Bugzilla, build
and RPM metadata
from Koji.
Btparser: we have used mostly string metrics (Damerau-Levenshtein
distance, Jaro-Winkler distance) adapted for backtraces, as they
work well enough. A letter in a string metric corresponds to a frame
in the crash thread. The only significant added complexity is
intelligent handling of frames with unknown function names (caused
by missing debuginfo, "??" in backtraces). Using the metrics, we
computing distance (similarity) between user's backtrace and already
reported backtraces
Homepage