The document summarizes a study on improving search queries for bug localization using natural language text from bug reports. The study evaluated different keyword selection techniques, generated optimal search queries using a genetic algorithm, and compared optimal versus non-optimal queries. Key findings include: 1) Current approaches failed to identify keywords for 34% of bug reports, 2) A genetic algorithm produced optimal queries that achieved up to 80% higher performance than baselines, and 3) Optimal queries differed in using less frequent, less ambiguous, noun-heavy keywords located in bug report bodies.
Optimal Search Queries Improve Bug Localization by Up to 80
1. The Forgotten Role of
Search Queries in IR-based
Bug Localization:
An Empirical Study
Masud Rahman*, Foutse Khomh$,
Shamima Yeasmin+, Chanchal Roy+
Dalhousie University*, Polytechnique Montréal$,
University of Saskatchewan+, Canada
2. Software Bugs
• 606 software bugs recorded in 2017
o $1.7 trillion costs to the global economy
o 3.7 billion users affected
o 314 companies impacted
• Developers spend ~50% of the time in
finding and fixing these bugs.
2
4. IR-Based Bug Localization
JDIValue, toString, execute,
EvaluationThread, run, NullPointerException
able cast null
Keyword
selection
127 Words
53
1
4
5. Query Reformulation in IR-
based Bug Localization
5
JDIValue, toString, execute,
EvaluationThread, run, NullPointerException
able cast null
88%
6. Research Questions
RQ1: How do the state-of-the-art approaches
perform in identifying appropriate search keywords
from bug reports for IR-based bug localization?
RQ2: Can optimal, near-optimal search queries be
constructed from the bug reports that lack bug
localization hints or simply contain natural language
only texts?
RQ3: How optimal, near-optimal, and non-optimal
search queries differ from each other in their
characteristics and performances?
6
7. Workflow of our Study
Bug-fixing
commits
Linking &
filtration
Refined
dataset
Bug reports
Finding
summary
7
RQ1: Frequency vs.
Graph-based keyword
selection
RQ2: Optimal vs. Baseline
search queries
RQ3: Optimal vs. Non-
optimal queries
8. Dataset, Metrics, & Setup
• Six subject systems: 2,320 bug reports
• Four performance metrics: Hit@K, MRR,
MAP, and Query Effectiveness (QE)
• QE: Rank of the first true positive.
• Baseline queries: title, description, title +
description
8
16. RQ3: Optimal vs. Non-Optimal
– Regression Analysis
16
Significant
17. Actionable Insights
• Frequency: Optimal keywords are less
frequent than non-optimal ones in a bug
report.
• Entropy: Optimal keywords are less
ambiguous than non-optimal ones.
• Location: Optimal keywords are more likely
to be found in the body section of a bug
report.
• POS: Optimal keywords are more likely to be
nouns than the non-optimal ones.
17
19. Take-Away Messages
• 34% of bug reports lead to very poor search
queries (RQ1)
• Even state-of-the-art approaches are not
sufficient to detect appropriate keywords
from them (RQ1)
• Genetic algorithm (GA) shows that optimal
keywords exist in those bug reports (RQ2)
• Optimal search queries can achieve up to
50%--80% Hit@1 and 56%--86% MAP.
19
20. • Optimal keywords are different than non-
optimal keywords (RQ3)
• Four aspects: frequency, term entropy,
term location, and part of speech.
• Insights lead to 27%--34% higher Hit@10.
• ML: How to automatically predict the
appropriate keywords from a bug report?
• GA: How to automatically determine the
fitness of a candidate search query?
20
Take-Away Messages