Optimal Search Queries Improve Bug Localization by Up to 80

The Forgotten Role of
Search Queries in IR-based
Bug Localization:
An Empirical Study
Masud Rahman*, Foutse Khomh$,
Shamima Yeasmin+, Chanchal Roy+
Dalhousie University*, Polytechnique Montréal$,
University of Saskatchewan+, Canada

Software Bugs
• 606 software bugs recorded in 2017
o $1.7 trillion costs to the global economy
o 3.7 billion users affected
o 314 companies impacted
• Developers spend ~50% of the time in
finding and fixing these bugs.
2

Bug Report & Bug Localization
Bug Localization
3

IR-Based Bug Localization
JDIValue, toString, execute,
EvaluationThread, run, NullPointerException
able cast null
Keyword
selection
127 Words
53
1
4

Query Reformulation in IR-
based Bug Localization
5
JDIValue, toString, execute,
EvaluationThread, run, NullPointerException
able cast null
88%

Research Questions
RQ1: How do the state-of-the-art approaches
perform in identifying appropriate search keywords
from bug reports for IR-based bug localization?
RQ2: Can optimal, near-optimal search queries be
constructed from the bug reports that lack bug
localization hints or simply contain natural language
only texts?
RQ3: How optimal, near-optimal, and non-optimal
search queries differ from each other in their
characteristics and performances?
6

Workflow of our Study
Bug-fixing
commits
Linking &
filtration
Refined
dataset
Bug reports
Finding
summary
7
RQ1: Frequency vs.
Graph-based keyword
selection
RQ2: Optimal vs. Baseline
search queries
RQ3: Optimal vs. Non-
optimal queries

Dataset, Metrics, & Setup
• Six subject systems: 2,320 bug reports
• Four performance metrics: Hit@K, MRR,
MAP, and Query Effectiveness (QE)
• QE: Rank of the first true positive.
• Baseline queries: title, description, title +
description
8

9
RQ1: Frequency vs. Graph-based
Keyword Selection for Search
PageRank Algorithm

RQ1: Frequency vs. Graph-based
Keywords for Bug Localization
10
Method Hit@1 Hit@10 MRR MAP
Baseline 31.98% 66.50% 0.43 42.69%
TF 24.39% 55.58% 0.34 33.94%
TF-IDF 27.02% 59.80% 0.37 36.76%
Kevic & Fritz 23.36% 52.92% 0.32 31.98%
Graph-based keyword selection
STRICT 25.82% 63.02% 0.37 37.31%
Query Min Median Average Max
Baseline 03 32 49 406
TF-IDF 03 10 10 10
STRICT 03 10 10 10

RQ1: Keyword Selection from
4 Subsets of Bug Reports
11
Bug reports with good baseline and no localization hints (567)
Baseline 41.96% 100.00% 0.60 58.15%
STRICT 36.14% 84.83% 0.51 50.29%
Bug reports with good baseline and localization hints (954)
Baseline 50.01% 100.00% 0.66 66.63%
TF-IDF 41.59% 83.55% 0.54 55.15%
Bug reports with poor baseline and no localization hints (372)
Baseline 0% 0% 0 0%
STRICT 1.56% 16.89% 0.05 5.12%
Bug reports with poor baseline and localization hints (427)
Baseline 0% 0% 0 0%
STRICT 4.91% 25.77% 0.11 10.66%
34%

RQ2: Optimal Query
Generation from a Bug Report
12
P = {q1, q2, q3,… qn}
Selection
Crossover
Mutation
Fitness
calculation
Qop = {k1, k2……km}
QE, MAP
Primitive dialogs

RQ2: Optimal vs. Baseline
Queries in Bug Localization
13
Baseline 0% 0% 0 0%
Optimal 50.04% 77.96% 0.58 56.47%
Baseline 0% 0% 0 0%
Optimal 80.70% 93.37% 0.85 86.19%
All bug reports (2,320)
Baseline 31.98% 66.50% 0.43 42.69%
Optimal 87.41% 95.74% 0.90 90.00%

RQ3: Non-Optimal vs. Optimal
Search Queries
14
• Query dataset: 13,914 search queries.
• 4,893 optimal search queries
• 5,164 non-optimal queries
• QE: Rank of the first true positive.
• Optimal: QE = 1, Non-optimal: QE > 10
• 31 query characteristics from literature
• Query classification using Random
Forest algorithm.

15
RQ3: Query Classification &
Feature Importance
Query Precision Recall
Optimal 84.60% 79.80%
Non-optimal 81.90% 86.20%
Frequency, Entropy,
Location, POS

RQ3: Optimal vs. Non-Optimal
– Regression Analysis
16
Significant

Actionable Insights
• Frequency: Optimal keywords are less
frequent than non-optimal ones in a bug
report.
• Entropy: Optimal keywords are less
ambiguous than non-optimal ones.
• Location: Optimal keywords are more likely
to be found in the body section of a bug
report.
• POS: Optimal keywords are more likely to be
nouns than the non-optimal ones.
17

RQ3: Query Improvements
using Actionable Insights
18
Baseline 0% 0% 0 0%
Baseline + Insights 1.28% 16.09% 0.04 4.50%
STRICT 1.56% 16.89% 0.05 5.12%
Insights + STRICT 2.66% 15.18% 0.05 5.09%
Baseline 0% 0% 0 0%
Baseline + Insights 6.79% 34.29% 0.14 13.79%
STRICT 4.91% 25.77% 0.11 10.66%
STRICT + Insights 6.80% 32.83% 0.13 13.39%

Take-Away Messages
• 34% of bug reports lead to very poor search
queries (RQ1)
• Even state-of-the-art approaches are not
sufficient to detect appropriate keywords
from them (RQ1)
• Genetic algorithm (GA) shows that optimal
keywords exist in those bug reports (RQ2)
• Optimal search queries can achieve up to
50%--80% Hit@1 and 56%--86% MAP.
19

• Optimal keywords are different than non-
optimal keywords (RQ3)
• Four aspects: frequency, term entropy,
term location, and part of speech.
• Insights lead to 27%--34% higher Hit@10.
• ML: How to automatically predict the
appropriate keywords from a bug report?
• GA: How to automatically determine the
fitness of a candidate search query?
20
Take-Away Messages

Thank you! Questions?
21
Masud Rahman, PhD
masud.rahman@dal.ca
@masud2336

RQ2: Query Improvement &
Worsening Ratios
22

Optimal Search Queries Improve Bug Localization by Up to 80

Recommended

Recommended

More Related Content

Similar to Optimal Search Queries Improve Bug Localization by Up to 80

Similar to Optimal Search Queries Improve Bug Localization by Up to 80 (17)

More from Masud Rahman

More from Masud Rahman (20)

Recently uploaded

Recently uploaded (20)

Optimal Search Queries Improve Bug Localization by Up to 80