SlideShare a Scribd company logo
1 of 55
Download to read offline
Evaluating Search Performance
Ingo Frommholz
University of Wolverhampton
ifrommholz@acm.org
@iFromm
ISKO-UK / BCS-IRSG
Exploring Information Retrieval
February 17, 2022
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 1 / 48
Credits
• The original slide deck was created in the context of a “Practitioners’
Evaluation Roundtable” at BCS Search Solutions 20211
• Some slides were created by Jochen L. Leidner from the Coburg
University of Applied Sciences and Arts and the University of Sheffield
1
https://irsg.bcs.org/informer/2022/01/search-solutions-2021-tutorials-report/
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 2 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 3 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 4 / 48
Why Do We Evaluate?
To answer questions such as:
• What should I do to improve the quality of my system?
• What works well, what doesn’t?
• Which retrieval model gives me the best results?
• Which system/search engine is better?
• Which system should I buy?
• How is ‘quality’ defined?
• How can I measure quality?
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 5 / 48
Evaluation Characteristics
• Reliability
• Reproducibility
• Sufficient documentation
• Representative datasets (documents and users)
• Remove potential bias
• Open source code and data (if possible); Open Science
• Validity
• Reflect ‘real’ circumstances
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 6 / 48
Evaluation Criteria
• Efficiency
• How quickly can a user solve a task?
• System’s response time
• Effectiveness
• Quality of results
• Information Retrieval deals with vagueness and uncertainty
• Results are rarely correct (everything retrieved is also relevant)
• Results are rarely complete (everything relevant is retrieved)
• Focus of the Cranfield Paradigm
• Satisfaction of the user with the system
• Are users happy with how the system supported their task?
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 7 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 8 / 48
Cyril Cleverdon’s Cranfield Studies
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 9 / 48
Retrieval Experiments, Cranfield Style [Har11]
Ingredients of a Test Collection
• A collection of documents
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 10 / 48
Retrieval Experiments, Cranfield Style [Har11]
Ingredients of a Test Collection
• A set of queries
?
?
?
?
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 10 / 48
Retrieval Experiments, Cranfield Style [Har11]
Ingredients of a Test Collection
• A set of relevance judgements ?
?
?
?
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 10 / 48
Evaluation Process
? ?
1
2
3
4
5
7
8
9
10
1
2
3
4
5
7
8
9
10
1
2
3
4
5
7
8
9
10
1
2
3
4
5
7
8
9
10
1 2 2
1
Recall
Precision
NDCG
MRR
F1
MAP P@n
1
2
6 6 6 6
Search box image courtesy of Martin White
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 11 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 12 / 48
Ranking, Recall and Precision
?
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
Ranking, Recall and Precision
? 1
2
3
4
5
7
8
9
10
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
Ranking, Recall and Precision
? 1
2
3
4
5
7
8
9
10
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
Ranking, Recall and Precision
? 1
2
3
4
5
7
8
9
10
Documents
Relevant
Documents
Result Set
Relevant
Retrieved
Documents
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
F-measure, Precision at n
F1 measure: harmonic mean of precision and recall
F1 = 2 ·
p · r
p + r
General Fβ score:
Fβ = (1 + β2
) ·
p · r
(β2 · p) + r
Precision at n (P@n): Precision after looking at n
documents
• For example, Web searchers usually look at 10
documents (first page) P@10
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 14 / 48
Recall–Precision Graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
precision
recall
baseline
merged
qknow-0.7
hknow-0.7
cknow-0.7-0.8
qrel-0.4
hrel-0.1
crel-0.1-0.4
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 15 / 48
Several Topics/Queries
?
?
?
?
1
2
3
4
5
7
8
9
10
1
2
3
4
5
7
8
9
10
1
2
3
4
5
7
8
9
10
1
2
3
4
5
7
8
9
10
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 16 / 48
Mean Average Precision (MAP), Mean Reciprocal Rank
(MRR)
Mean average precision (MAP) [BV05]:
1 Measure precision after each relevant document
2 Average over the precision values to get
average precision for one topic/query
3 Average over topics/queries
Mean Reciprocal Rank (MRR): Position of the first
correct (relevant) result (e.g. question answering):
Ri ranking of query qi
Nq number of queries
Scorr position of the first correct answer in the
ranking
MRR =
1
Nq
Nq
X
1
1
Scorr(Ri )
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 17 / 48
System- vs user-oriented measures
• Measures like Precision, Recall, Fβ and MAP are system-oriented
measures with no assumption on user behaviour.
• Other measures like P@n and MRR are user-oriented as they make
certain simple assumptions on user behaviour.
• Seen n documents
• Seen n relevant documents
• Seen n non-relevant documents
• Seen n non-relevant documents consecutively
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 18 / 48
Graded Relevance
• Sometimes, relevance is non-binary (e.g., ‘not relevant’, ‘marginally
relevant’, ‘fully relevant’) graded relevance
• Highly relevant documents at the top of the ranking should be
preferred over those at the end
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 19 / 48
Normalized Discounted Cumulative Gain (NDCG)
[JK02]
• Relevance judgements ∈ {0, 1, 2, 3}
1
2
3
4
5
7
8
9
10
6
1
0
0
3
1
0
0
0
2
1
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 20 / 48
Normalized Discounted Cumulative Gain (NDCG)
[JK02]
• Relevance judgements ∈ {0, 1, 2, 3}
• Gain vector Gi for ranking Ri induced by query
qi
• Gain at postion j depends on level of relevance
• Cumulative gain vector CGi
CGi [j] =

Gi [1] if j = 1
Gi [j] + CGi [j − 1] if j  1
• Discounted cumulative gain vector DCGi :
taking the position j into account (discount
factor log2 j)
DCGi [j] =
(
Gi [1] if j = 1
Gi [j]
log2 j + DCGi [j − 1] if j  1
1
2
3
4
5
7
8
9
10
6
1
0
0
3
1
0
0
0
2
1
1
0
0
3
1
0
0
0
2
1
1
1
1
4
5
7
8
8
7
8
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 20 / 48
Normalized Discounted Cumulative Gain (NDCG)
[JK02]
• Ideal gain vector
IGi = (3, 2, 1, 1, 1, 0, 0, 0, 0, 0)
• ICGi and IDCGi analogously
• Average over all queries to compute NDCG:
DCG[j] =
1
Nq
Nq
X
i=1
DCGi [j]
IDCG[j] =
1
Nq
Nq
X
i=1
IDCGi [j]
NDCG[j] =
DCG[j]
IDCG[j]
1
2
3
4
5
7
8
9
10
6
1
0
0
3
1
0
0
0
2
1
1
0
0
3
1
0
0
0
2
1
1
1
1
4
5
7
8
8
7
8
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 20 / 48
NDCG – Comparing Systems
• Compare NDCG curves
• Compare at a given position,
e.g., NDGC[10] analogously to
P@10
Fig. 4(a). Normalized discounted cumulated gain (nDCG) curves, binary
Fig. 4(b). Normalized discounted cumulated gain (nDCG) curves, nonbina
lengths, the results of the statistical tests might have changed. Als
of topics (20) is rather small to provide reliable results. Howev
data illuminate the behavior of the (n)(D)CG measures.
(Taken from [JK02])
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 21 / 48
Some Evaluation Metrics
• Accuracy
• Precision (p)
• Recall (r)
• Fall-out (converse of Specificity)
• F-score (F-measure, converse of Effectiveness) (Fβ)
• Precision at k (P@k)
• R-precision (RPrec)
• Mean average precision (MAP)
• Mean Reciprocal Rank (MRR)
• Normalized Discounted Cumulative Gain (NDCG)
• Maximal Marginal Relevance (MMR)
• Other Metrics: bpref, GMAP, . . .
Some metrics (e.g., MRR) are controversially discussed [Fuh17; Sak20;
Fuh20].
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 22 / 48
Precision- and Recall-oriented Tasks
Which metric to use?
• Many tasks are precision-oriented, i.e. we seek a high precision
• Web search
• Mobile search
• Question answering
• Other tasks are recall-oriented, i.e. we seek a high recall
• Patent search for prior art (to check for novelty of an application)
• Systematic reviews
• Investigative journalism
• Real life may force you to customize the metric to use!
• E.g.: Is Precision or Recall more important for your task? (If equal,
then use F1, otherwise F0.5 or F5 may be more suitable.)
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 23 / 48
“Hacking Your Measures” – Evaluating Structured
Document Retrieval
INEX – INitiative for the Evaluation of XML Retrieval [LÖ09]
• Find smallest component that is
highly relevant
• INEX created long discussion
threads on suitable evaluation
measures
• Specificity: Extend to which a
document component is focused
on the information need
• Exhaustivity: Extend to which
the information contained in a
document component satisfies
the information need
Book
Chapter
Section
Subsection
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 24 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 25 / 48
The Need for Evaluation Initiatives
• Shared data collections for robust evaluation
• Common evaluation measures
• Result comparability across a variety of techniques
• Discussion and reflection
• Collaborative creation of larger test collections
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 26 / 48
TREC – Text REtrieval Conference
• US DARPA funded public benchmark and
associated workshop series (1992-) [VH99;
VH05]
• Some past TREC tracks
• Ad hoc
• Filtering
• Interactive tracks
• Cross-lingual tracks
• Very large collection / Web track
• Enterprise Track
• Question answering
• TREC 2022
• Clinical Trials Track
• Conversational Assistance Track
• CrisisFACTS Track
• Deep Learning Track
• Fair Ranking Track
• Health Misinformation Track
• NeuCLIR Track
http://trec.nist.gov/
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 27 / 48
Other Common Evaluation Initiatives
• NTCIR [SOK21]: Japanese
language initiative (1999-)
• CLEF [FP19]: European
initiative on monolingual and
cross-lingual search (2000-);
• MediaEval [Lar+17]:
Benchmarking Initiative for
Multimedia Evaluation
• CORD-19 dataset [Wan+20],
used in TREC-COVID [Voo+20]
Source: client of Intranet Focus Ltd. Thanks again Martin White!
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 28 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 29 / 48
Where is the User?
• Only little user involvement in the Cranfield paradigm
• The client, customer or user should be at the centre.
• Not a single stakeholder:
• People are different (user diversity)
• People search differently
• People have different information needs (objective diversity)
• People hold different professional roles (role diversity)
• People differ with respect to skills  experiences (skill diversity)
• Relevance is subjective and situational and there is more than topical
relevance [Miz98]
• Usefulness, appropriateness, accessibility, comprehensibility, novelty, etc.
• The user also cares about the user experience (UX)
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 30 / 48
Search as “Berrypicking”
Marcia Bates [Bat89]
Search is an iterative process
formation need cannot be satisfied by a single result se
stead: sequence of selections and collection of pieces o
formation during search
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 31 / 48
Information Seeking  Searching Context
Ingwersen/Järvelin: The Turn [IJ05]
Integration of Information Seeking and Retrieval in Context
these sources/systems and tools needs to take their joint usability, quality
of information and process into account. One may ask what is the contribu-
tion of an IR system at the end of a seeking process – over time, over seek-
ing tasks, and over seekers. Since the knowledge sources, systems and
tools are not used in isolation they should not be designed nor evaluated in
t
isolation. They affect each other’s utility in context.
Fig. 7.2. Nested contexts and evaluation criteria for task-based ISR (extension of
Docs
Repr
DB
Request
Query
Matc
h
Repr
Result
A: Recall, precision, efficiency, quality of information/process
B: Usability, quality of information/process
C: Qualityof info work process/result
Work Task
Seeking
Task
Seeking
Process
Work
Process
Task
Result
Seeking
Result
Evaluation
Criteria:
Work task context
Seeking context
IR context
Socio-organizational  cultural context
D:Socio-cognitive relevance; quality
of work task result
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 32 / 48
Interactive IR Research Continuum
Diane Kelly [Kel07]
10 What is IIR?
System
Focus
Human
Focus
TREC-style
Studies
Information-Seeking
Behavior
in
Context
Log
Analysis
TREC
Interactive
Studies
Experimental
Information
Behavior
Information-Seeking
Behavior
with
IR
Systems
“Users”
make
relevance
assessments
Filtering
and
SDI
Archetypical IIR
Study
Fig. 2.1 Research continuum for conceptualizing IIR research.
search experiences and behaviors, and their interactions with systems.
This type of study differs from a pure system-centered study because
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 33 / 48
Simulated Tasks
Pia Borlund [Bor03]
• Use of realistic scenarios
• Simulated work tasks:
Simulated work task situation: After your graduation you will
be looking for a job in industry. You want information to help you
focus your future job seeking. You know it pays to know the mar-
ket. You would like to find some information about employment
patterns in industry and what kind of qualifications employers will
be looking for from future employees.
Indicative request: Find, for instance, something about future
employment trends in industry, i.e., areas of growth and decline.
• Simulated work tasks should be realistic (e.g., not “imagine you’re
the first human on Mars...”)
• However, a less relatable task may be outweighed by a topically very
interesting situation
• Taking into account situational factors (situational relevance)
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 34 / 48
Living Labs
[BS21; JBD18]
• Offline evaluation
(Cranfield):
reproducible, not taking
users into account
• Online evaluation:
Involving live users, lack
of reproducibility
• Living labs: aims at
making online evaluation
fully reproducible
• TREC OpenSearch,
CLEF LL4IR, CLEF
NewsREEL, NCTIR-13
OpenLive Q
qid -
docid rank
score
identifier
-
(from [BS21])
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 35 / 48
What to Evaluate?
• Relevance (retrieval effectiveness) ←− most IR academics focus here
• Coverage (e.g. percentage of actual user queries answerable)
• Speed (throughput, responsiveness)
• User interface quality (UX)
• Cost/time to build
• Cost to operate
• Task completion time
• Scalability (order of magnitude of number of docs.)
• Memory requirements (transient, persistent)
• Index freshness (how long before I can retrieve it)
• User’s subjective satisfaction
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 36 / 48
Table of Contents
1 Evaluation: Why and What?
2 The Cranfield Paradigm
3 Common Evaluation Metrics
4 Selected Evaluation Initiatives
5 Beyond Cranfield
6 The Practitioner View
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 37 / 48
Challenges to Practitioners
• Legacy: old code-base with strange/non-standard/non-effective
evaluation metrics in use combined with a resistance to change;
• Knowledge gap: lack of skills/expertise/experience in the core team;
• Resources: no gold data available;
• Planning: evaluation was not budgeted for;
• Awareness: team consists of traditional managers and software
engineers that do not realise quantitative evaluation is a thing;
• Infrastructure: evaluation needs to be done “in vivo” as there is no
second system instance available; and
• Scaffolding: search component is deeply embedded in the overall
system and cannot be run as a batch script.
• Standard evaluation metrics do not assess domain peculiarities [LC14]
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 38 / 48
What is Different in Industry?
• Effectiveness is only one of many concerns (sometimes even
forgotten!)
• Functionality-oriented view (search seen as one of many functions, no
or little awareness of effectiveness issues)
• Purchase as success criterium!
• Search is often (and wrongly) considered a “solved” problem
• Frequently held developer sentiment: “Just use Elastic, Solr or
Lucene, and we’re done.”
• Developers unskilled in IR may integrate libraries with default settings
inappropriate for a given use case
• Evaluation often done online (on a running system) - A/B testing
[KL17]
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 39 / 48
References and Further Reading
• Introduction to Information Retrieval - Evaluation (slides)
https://web.stanford.edu/class/cs276/handouts/
EvaluationNew-handout-6-per.pdf
• Online Controlled Experiments and A/B Testing
https://www.researchgate.net/profile/Ron-Kohavi/publication/
316116834_Online_Controlled_Experiments_and_AB_Testing
• Enterprise Search – Evaluation (Chapter 4 of [KH])
https://www.flax.co.uk/wp-content/uploads/2017/11/ES_book_
final_journal_version.pdf
• TREC Conferences https://trec.nist.gov
• CLEF Initiative http://www.clef-initiative.eu
• NTCIR Workshops
http://research.nii.ac.jp/ntcir/data/data-en.html
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 40 / 48
Thank you!
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 41 / 48
Bibliography I
[Bat89] M J Bates. “The design of browsing and berrypicking
techniques for the online search interface”. In: Online Review
13.5 (1989), pp. 407–424.
[Bor03] Pia Borlund. “The IIR evaluation model: a framework for
evaluation of interactive information retrieval systems”. In:
Information Research 8.3 (2003). url:
http://informationr.net/ir/8-3/paper152.html.
[BS21] Timo Breuer and Philipp Schaer. “A Living Lab Architecture
for Reproducible Shared Task Experimentation”. In:
Information between Data and Knowledge. Vol. 74. Schriften
zur Informationswissenschaft. Session 6: Emerging
Technologies. Glückstadt: Werner Hülsbusch, 2021,
pp. 348–362. doi: 10.5283/epub.44953.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 42 / 48
Bibliography II
[BV05] Chris Buckley and Ellen Vorhees. “Retrieval System
Evaluation”. In: TREC Exp. Eval. Inf. Retr. Ed. by
Ellen Vorhees and Donna Harman. MIT Press, 2005, Chapter
3.
[FP19] Nicola Ferro and Carol Peters, eds. Information Retrieval
Evaluation in a Changing World: Lessons Learned from 20
Years of CLEF. Cham, CH: Springer-Nature, 2019.
[Fuh17] Norbert Fuhr. “Some Common Mistakes In IR Evaluation,
And How They Can Be Avoided”. In: SIGIR Forum 33 (2017).
url: http://www.is.informatik.uni-
duisburg.de/bib/pdf/ir/Fuhr_17b.pdf.
[Fuh20] Norbert Fuhr. “SIGIR Keynote: Proof By Experimentation?
Towards Better IR Research”. In: SIGIR Forum 54.2 (2020),
pp. 1–4. url: http://sigir.org/wp-
content/uploads/2020/12/p04.pdf.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 43 / 48
Bibliography III
[Har11] Donna Harman. Information Retrieval Evaluation. Synthesis
Lectures on Information Concepts, Retrieval, and Services.
San Francisco, CA, USA: Morgan  Claypool, 2011. doi:
https:
//doi.org/10.2200/S00368ED1V01Y201105ICR019.
[IJ05] Peter Ingwersen and Kalvero Järvelin. The turn: integration of
information seeking and retrieval in context. Secaucus, NJ,
USA: Springer-Verlag New York, Inc., 2005. isbn:
140203850X.
[JBD18] Rolf Jagerman, Krisztian Balog, and Maarten De Rijke.
“OpenSearch: Lessons learned from an online evaluation
campaign”. In: J. Data Inf. Qual. 10.3 (2018). issn:
19361963. doi: 10.1145/3239575.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 44 / 48
Bibliography IV
[JK02] Kalvero Järvelin and Jaana Kekäläinen. “Cumulated
Gain-Based Evaluation of IR Techniques”. In: ACM Trans. Inf.
Syst. 20.4 (2002), pp. 422–446. url: https://www.cc.
gatech.edu/$%5Csim$zha/CS8803WST/dcg.pdf.
[Kel07] Diane Kelly. “Methods for Evaluating Interactive Information
Retrieval Systems with Users”. In: Found. Trends Inf. Retr.
3.1—2 (2007), pp. 1–224. issn: 1554-0669. doi:
10.1561/1500000012. url: http:
//www.nowpublishers.com/article/Details/INR-012.
[KH] Udo Kruschwitz and Charlie Hull. Searching the Enterprise.
Foundations and Trends in Information Retrieval. Now
Publishers. isbn: 978-1680833041. doi:
http://dx.doi.org/10.1561/1500000053.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 45 / 48
Bibliography V
[KL17] Ron Kohavi and Roger Longbotham. “Online Controlled
Experiments and A/B Testing”. In: Encyclopedia of Machine
Learning and Data Mining. Ed. by Claude Sammut and
Geoffrey I. Webb. Boston, MA, USA: Springer, 2017,
pp. 922–929. doi: 10.1007/978-1-4899-7687-1_891.
[Lar+17] Martha Larson et al. “The Benchmarking Initiative for
Multimedia Evaluation: MediaEval 2016”. In: IEEE Multimed.
24.1 (2017), pp. 93–96. issn: 1070986X. doi:
10.1109/MMUL.2017.9.
[LC14] Bo Long and Yi Chang. Relevance Ranking for Vertical Search
Engines. Morgan Kaufmann, 2014. isbn: 978-0124071711.
[LÖ09] Ling Liu and M. Tamer Özsu. “XML Information Retrieval”.
In: Encyclopedia of Database Systems. Ed. by Ling Liu and
M. Tamer Özsu. Springer, 2009. doi:
https://doi.org/10.1007/978-0-387-39940-9_4084.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 46 / 48
Bibliography VI
[Miz98] Stefano Mizzaro. “How many relevances in information
retrieval?” In: Interact. Comput. 10 (1998), pp. 303–320.
[Sak20] Tetsuya Sakai. “On Fuhr’s Guideline for IR Evaluation”. In:
SIGIR Forum 54.1 (2020).
[SOK21] Tetsuya Sakai, Douglas W. Oard, and Noriko Kando, eds.
Evaluating Information Retrieval and Access Tasks: NTCIR’s
Legacy of Research Impact. The Information Retrieval Series,
43. Heidelberg, Germany: Springer, 2021. isbn:
978-9811555534.
[VH05] Ellen M. Voorhees and Donna K. Harman, eds. TREC -
Experiment and Evaluation in Information: Experiment and
Evaluation in Information Retrieval. Cambridge, MA, USA:
MIT Press, 2005.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 47 / 48
Bibliography VII
[VH99] Ellen M. Voorhees and Donna K. Harman. “The Text
REtrieval Conference (TREC): history and plans for TREC-9”.
In: SIGIR Forum 33.2 (1999), pp. 12–15. doi:
10.1145/344250.344252.
[Voo+20] Ellen Voorhees et al. “TREC-COVID: Constructing a
Pandemic Information Retrieval Test Collection”. In: SIGIR
Forum 54.1 (2020), pp. 1–12. arXiv: 2005.04474. url:
http://arxiv.org/abs/2005.04474.
[Wan+20] Lucy Lu Wang et al. “CORD-19: The Covid-19 Open
Research Dataset.”. In: ArXiv (2020). issn: 2331-8422. arXiv:
2004.10706. url: http://www.pubmedcentral.nih.gov/
articlerender.fcgi?artid=PMC7251955.
I. Frommholz Evaluating Search Performance Exploring Information Retrieval 48 / 48

More Related Content

Similar to Evaluating Search Systems Using Standard Metrics

Presentation on Software process improvement in GSD
Presentation on Software process improvement in GSDPresentation on Software process improvement in GSD
Presentation on Software process improvement in GSDRafi Ullah
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
 
BRM - Chapter 01 Dr Berihun.pdf
BRM - Chapter 01 Dr Berihun.pdfBRM - Chapter 01 Dr Berihun.pdf
BRM - Chapter 01 Dr Berihun.pdfAbdirehmanArab
 
Pivot Selection Techniques
Pivot Selection TechniquesPivot Selection Techniques
Pivot Selection TechniquesCatarina Moreira
 
Uk Research Infrastructure Workshop E-infrastructure Juan Bicarregui
Uk Research Infrastructure Workshop E-infrastructure Juan BicarreguiUk Research Infrastructure Workshop E-infrastructure Juan Bicarregui
Uk Research Infrastructure Workshop E-infrastructure Juan BicarreguiInnovate UK
 
Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2Sanjeev Deshmukh
 
Questionnaires
QuestionnairesQuestionnaires
QuestionnairesCIToolkit
 
Marketing research
Marketing researchMarketing research
Marketing researchankitaun
 
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...IRJET Journal
 
information technology materrailas paper
information technology materrailas paperinformation technology materrailas paper
information technology materrailas papermelkamutesfay1
 
Collaborative Teaching of ERP Systems in International Context
Collaborative Teaching of ERP Systems in International ContextCollaborative Teaching of ERP Systems in International Context
Collaborative Teaching of ERP Systems in International ContextJānis Grabis
 
Dynamic Search and Beyond
Dynamic Search and BeyondDynamic Search and Beyond
Dynamic Search and BeyondGrace Hui Yang
 
5 analytic hierarchy_process
5 analytic hierarchy_process5 analytic hierarchy_process
5 analytic hierarchy_processFEG
 
analytic hierarchy_process
analytic hierarchy_processanalytic hierarchy_process
analytic hierarchy_processFEG
 
Kendall7e ch05
Kendall7e ch05Kendall7e ch05
Kendall7e ch05sayAAhmad
 
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...Kaitlan Chu
 

Similar to Evaluating Search Systems Using Standard Metrics (20)

Presentation on Software process improvement in GSD
Presentation on Software process improvement in GSDPresentation on Software process improvement in GSD
Presentation on Software process improvement in GSD
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Marketing research
Marketing researchMarketing research
Marketing research
 
BRM - Chapter 01 Dr Berihun.pdf
BRM - Chapter 01 Dr Berihun.pdfBRM - Chapter 01 Dr Berihun.pdf
BRM - Chapter 01 Dr Berihun.pdf
 
Pivot Selection Techniques
Pivot Selection TechniquesPivot Selection Techniques
Pivot Selection Techniques
 
Uk Research Infrastructure Workshop E-infrastructure Juan Bicarregui
Uk Research Infrastructure Workshop E-infrastructure Juan BicarreguiUk Research Infrastructure Workshop E-infrastructure Juan Bicarregui
Uk Research Infrastructure Workshop E-infrastructure Juan Bicarregui
 
Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2Trends in-om-scm-27-july-2012-2
Trends in-om-scm-27-july-2012-2
 
Questionnaires
QuestionnairesQuestionnaires
Questionnaires
 
Marketing research
Marketing researchMarketing research
Marketing research
 
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
Fuzzy Analytical Hierarchy Process Method to Determine the Project Performanc...
 
Chap004
Chap004Chap004
Chap004
 
Research methods
Research methodsResearch methods
Research methods
 
information technology materrailas paper
information technology materrailas paperinformation technology materrailas paper
information technology materrailas paper
 
Collaborative Teaching of ERP Systems in International Context
Collaborative Teaching of ERP Systems in International ContextCollaborative Teaching of ERP Systems in International Context
Collaborative Teaching of ERP Systems in International Context
 
Project proposal
Project proposalProject proposal
Project proposal
 
Dynamic Search and Beyond
Dynamic Search and BeyondDynamic Search and Beyond
Dynamic Search and Beyond
 
5 analytic hierarchy_process
5 analytic hierarchy_process5 analytic hierarchy_process
5 analytic hierarchy_process
 
analytic hierarchy_process
analytic hierarchy_processanalytic hierarchy_process
analytic hierarchy_process
 
Kendall7e ch05
Kendall7e ch05Kendall7e ch05
Kendall7e ch05
 
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
 

More from Ingo Frommholz

Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryIngo Frommholz
 
Quantum Mechanics meet Information Search and Retrieval – The QUARTZ Project
Quantum Mechanics meet Information Search and Retrieval – The QUARTZ ProjectQuantum Mechanics meet Information Search and Retrieval – The QUARTZ Project
Quantum Mechanics meet Information Search and Retrieval – The QUARTZ ProjectIngo Frommholz
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Ingo Frommholz
 
Polyrepresentation in a Quantum-inspired Information Retrieval Framework
Polyrepresentation in a Quantum-inspired Information Retrieval FrameworkPolyrepresentation in a Quantum-inspired Information Retrieval Framework
Polyrepresentation in a Quantum-inspired Information Retrieval FrameworkIngo Frommholz
 
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...Ingo Frommholz
 
Information Retrieval Models Part I
Information Retrieval Models Part IInformation Retrieval Models Part I
Information Retrieval Models Part IIngo Frommholz
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalIngo Frommholz
 

More from Ingo Frommholz (7)

Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum Theory
 
Quantum Mechanics meet Information Search and Retrieval – The QUARTZ Project
Quantum Mechanics meet Information Search and Retrieval – The QUARTZ ProjectQuantum Mechanics meet Information Search and Retrieval – The QUARTZ Project
Quantum Mechanics meet Information Search and Retrieval – The QUARTZ Project
 
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Modelling User Interaction utilising Information Foraging Theory (and a bit o...
Modelling User Interaction utilising Information Foraging Theory (and a bit o...
 
Polyrepresentation in a Quantum-inspired Information Retrieval Framework
Polyrepresentation in a Quantum-inspired Information Retrieval FrameworkPolyrepresentation in a Quantum-inspired Information Retrieval Framework
Polyrepresentation in a Quantum-inspired Information Retrieval Framework
 
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
 
Information Retrieval Models Part I
Information Retrieval Models Part IInformation Retrieval Models Part I
Information Retrieval Models Part I
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information Retrieval
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Evaluating Search Systems Using Standard Metrics

  • 1. Evaluating Search Performance Ingo Frommholz University of Wolverhampton ifrommholz@acm.org @iFromm ISKO-UK / BCS-IRSG Exploring Information Retrieval February 17, 2022 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 1 / 48
  • 2. Credits • The original slide deck was created in the context of a “Practitioners’ Evaluation Roundtable” at BCS Search Solutions 20211 • Some slides were created by Jochen L. Leidner from the Coburg University of Applied Sciences and Arts and the University of Sheffield 1 https://irsg.bcs.org/informer/2022/01/search-solutions-2021-tutorials-report/ I. Frommholz Evaluating Search Performance Exploring Information Retrieval 2 / 48
  • 3. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 3 / 48
  • 4. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 4 / 48
  • 5. Why Do We Evaluate? To answer questions such as: • What should I do to improve the quality of my system? • What works well, what doesn’t? • Which retrieval model gives me the best results? • Which system/search engine is better? • Which system should I buy? • How is ‘quality’ defined? • How can I measure quality? I. Frommholz Evaluating Search Performance Exploring Information Retrieval 5 / 48
  • 6. Evaluation Characteristics • Reliability • Reproducibility • Sufficient documentation • Representative datasets (documents and users) • Remove potential bias • Open source code and data (if possible); Open Science • Validity • Reflect ‘real’ circumstances I. Frommholz Evaluating Search Performance Exploring Information Retrieval 6 / 48
  • 7. Evaluation Criteria • Efficiency • How quickly can a user solve a task? • System’s response time • Effectiveness • Quality of results • Information Retrieval deals with vagueness and uncertainty • Results are rarely correct (everything retrieved is also relevant) • Results are rarely complete (everything relevant is retrieved) • Focus of the Cranfield Paradigm • Satisfaction of the user with the system • Are users happy with how the system supported their task? I. Frommholz Evaluating Search Performance Exploring Information Retrieval 7 / 48
  • 8. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 8 / 48
  • 9. Cyril Cleverdon’s Cranfield Studies I. Frommholz Evaluating Search Performance Exploring Information Retrieval 9 / 48
  • 10. Retrieval Experiments, Cranfield Style [Har11] Ingredients of a Test Collection • A collection of documents I. Frommholz Evaluating Search Performance Exploring Information Retrieval 10 / 48
  • 11. Retrieval Experiments, Cranfield Style [Har11] Ingredients of a Test Collection • A set of queries ? ? ? ? I. Frommholz Evaluating Search Performance Exploring Information Retrieval 10 / 48
  • 12. Retrieval Experiments, Cranfield Style [Har11] Ingredients of a Test Collection • A set of relevance judgements ? ? ? ? I. Frommholz Evaluating Search Performance Exploring Information Retrieval 10 / 48
  • 13. Evaluation Process ? ? 1 2 3 4 5 7 8 9 10 1 2 3 4 5 7 8 9 10 1 2 3 4 5 7 8 9 10 1 2 3 4 5 7 8 9 10 1 2 2 1 Recall Precision NDCG MRR F1 MAP P@n 1 2 6 6 6 6 Search box image courtesy of Martin White I. Frommholz Evaluating Search Performance Exploring Information Retrieval 11 / 48
  • 14. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 12 / 48
  • 15. Ranking, Recall and Precision ? I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
  • 16. Ranking, Recall and Precision ? 1 2 3 4 5 7 8 9 10 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
  • 17. Ranking, Recall and Precision ? 1 2 3 4 5 7 8 9 10 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
  • 18. Ranking, Recall and Precision ? 1 2 3 4 5 7 8 9 10 Documents Relevant Documents Result Set Relevant Retrieved Documents I. Frommholz Evaluating Search Performance Exploring Information Retrieval 13 / 48
  • 19. F-measure, Precision at n F1 measure: harmonic mean of precision and recall F1 = 2 · p · r p + r General Fβ score: Fβ = (1 + β2 ) · p · r (β2 · p) + r Precision at n (P@n): Precision after looking at n documents • For example, Web searchers usually look at 10 documents (first page) P@10 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 14 / 48
  • 20. Recall–Precision Graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 precision recall baseline merged qknow-0.7 hknow-0.7 cknow-0.7-0.8 qrel-0.4 hrel-0.1 crel-0.1-0.4 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 15 / 48
  • 22. Mean Average Precision (MAP), Mean Reciprocal Rank (MRR) Mean average precision (MAP) [BV05]: 1 Measure precision after each relevant document 2 Average over the precision values to get average precision for one topic/query 3 Average over topics/queries Mean Reciprocal Rank (MRR): Position of the first correct (relevant) result (e.g. question answering): Ri ranking of query qi Nq number of queries Scorr position of the first correct answer in the ranking MRR = 1 Nq Nq X 1 1 Scorr(Ri ) I. Frommholz Evaluating Search Performance Exploring Information Retrieval 17 / 48
  • 23. System- vs user-oriented measures • Measures like Precision, Recall, Fβ and MAP are system-oriented measures with no assumption on user behaviour. • Other measures like P@n and MRR are user-oriented as they make certain simple assumptions on user behaviour. • Seen n documents • Seen n relevant documents • Seen n non-relevant documents • Seen n non-relevant documents consecutively I. Frommholz Evaluating Search Performance Exploring Information Retrieval 18 / 48
  • 24. Graded Relevance • Sometimes, relevance is non-binary (e.g., ‘not relevant’, ‘marginally relevant’, ‘fully relevant’) graded relevance • Highly relevant documents at the top of the ranking should be preferred over those at the end I. Frommholz Evaluating Search Performance Exploring Information Retrieval 19 / 48
  • 25. Normalized Discounted Cumulative Gain (NDCG) [JK02] • Relevance judgements ∈ {0, 1, 2, 3} 1 2 3 4 5 7 8 9 10 6 1 0 0 3 1 0 0 0 2 1 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 20 / 48
  • 26. Normalized Discounted Cumulative Gain (NDCG) [JK02] • Relevance judgements ∈ {0, 1, 2, 3} • Gain vector Gi for ranking Ri induced by query qi • Gain at postion j depends on level of relevance • Cumulative gain vector CGi CGi [j] = Gi [1] if j = 1 Gi [j] + CGi [j − 1] if j 1 • Discounted cumulative gain vector DCGi : taking the position j into account (discount factor log2 j) DCGi [j] = ( Gi [1] if j = 1 Gi [j] log2 j + DCGi [j − 1] if j 1 1 2 3 4 5 7 8 9 10 6 1 0 0 3 1 0 0 0 2 1 1 0 0 3 1 0 0 0 2 1 1 1 1 4 5 7 8 8 7 8 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 20 / 48
  • 27. Normalized Discounted Cumulative Gain (NDCG) [JK02] • Ideal gain vector IGi = (3, 2, 1, 1, 1, 0, 0, 0, 0, 0) • ICGi and IDCGi analogously • Average over all queries to compute NDCG: DCG[j] = 1 Nq Nq X i=1 DCGi [j] IDCG[j] = 1 Nq Nq X i=1 IDCGi [j] NDCG[j] = DCG[j] IDCG[j] 1 2 3 4 5 7 8 9 10 6 1 0 0 3 1 0 0 0 2 1 1 0 0 3 1 0 0 0 2 1 1 1 1 4 5 7 8 8 7 8 I. Frommholz Evaluating Search Performance Exploring Information Retrieval 20 / 48
  • 28. NDCG – Comparing Systems • Compare NDCG curves • Compare at a given position, e.g., NDGC[10] analogously to P@10 Fig. 4(a). Normalized discounted cumulated gain (nDCG) curves, binary Fig. 4(b). Normalized discounted cumulated gain (nDCG) curves, nonbina lengths, the results of the statistical tests might have changed. Als of topics (20) is rather small to provide reliable results. Howev data illuminate the behavior of the (n)(D)CG measures. (Taken from [JK02]) I. Frommholz Evaluating Search Performance Exploring Information Retrieval 21 / 48
  • 29. Some Evaluation Metrics • Accuracy • Precision (p) • Recall (r) • Fall-out (converse of Specificity) • F-score (F-measure, converse of Effectiveness) (Fβ) • Precision at k (P@k) • R-precision (RPrec) • Mean average precision (MAP) • Mean Reciprocal Rank (MRR) • Normalized Discounted Cumulative Gain (NDCG) • Maximal Marginal Relevance (MMR) • Other Metrics: bpref, GMAP, . . . Some metrics (e.g., MRR) are controversially discussed [Fuh17; Sak20; Fuh20]. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 22 / 48
  • 30. Precision- and Recall-oriented Tasks Which metric to use? • Many tasks are precision-oriented, i.e. we seek a high precision • Web search • Mobile search • Question answering • Other tasks are recall-oriented, i.e. we seek a high recall • Patent search for prior art (to check for novelty of an application) • Systematic reviews • Investigative journalism • Real life may force you to customize the metric to use! • E.g.: Is Precision or Recall more important for your task? (If equal, then use F1, otherwise F0.5 or F5 may be more suitable.) I. Frommholz Evaluating Search Performance Exploring Information Retrieval 23 / 48
  • 31. “Hacking Your Measures” – Evaluating Structured Document Retrieval INEX – INitiative for the Evaluation of XML Retrieval [LÖ09] • Find smallest component that is highly relevant • INEX created long discussion threads on suitable evaluation measures • Specificity: Extend to which a document component is focused on the information need • Exhaustivity: Extend to which the information contained in a document component satisfies the information need Book Chapter Section Subsection I. Frommholz Evaluating Search Performance Exploring Information Retrieval 24 / 48
  • 32. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 25 / 48
  • 33. The Need for Evaluation Initiatives • Shared data collections for robust evaluation • Common evaluation measures • Result comparability across a variety of techniques • Discussion and reflection • Collaborative creation of larger test collections I. Frommholz Evaluating Search Performance Exploring Information Retrieval 26 / 48
  • 34. TREC – Text REtrieval Conference • US DARPA funded public benchmark and associated workshop series (1992-) [VH99; VH05] • Some past TREC tracks • Ad hoc • Filtering • Interactive tracks • Cross-lingual tracks • Very large collection / Web track • Enterprise Track • Question answering • TREC 2022 • Clinical Trials Track • Conversational Assistance Track • CrisisFACTS Track • Deep Learning Track • Fair Ranking Track • Health Misinformation Track • NeuCLIR Track http://trec.nist.gov/ I. Frommholz Evaluating Search Performance Exploring Information Retrieval 27 / 48
  • 35. Other Common Evaluation Initiatives • NTCIR [SOK21]: Japanese language initiative (1999-) • CLEF [FP19]: European initiative on monolingual and cross-lingual search (2000-); • MediaEval [Lar+17]: Benchmarking Initiative for Multimedia Evaluation • CORD-19 dataset [Wan+20], used in TREC-COVID [Voo+20] Source: client of Intranet Focus Ltd. Thanks again Martin White! I. Frommholz Evaluating Search Performance Exploring Information Retrieval 28 / 48
  • 36. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 29 / 48
  • 37. Where is the User? • Only little user involvement in the Cranfield paradigm • The client, customer or user should be at the centre. • Not a single stakeholder: • People are different (user diversity) • People search differently • People have different information needs (objective diversity) • People hold different professional roles (role diversity) • People differ with respect to skills experiences (skill diversity) • Relevance is subjective and situational and there is more than topical relevance [Miz98] • Usefulness, appropriateness, accessibility, comprehensibility, novelty, etc. • The user also cares about the user experience (UX) I. Frommholz Evaluating Search Performance Exploring Information Retrieval 30 / 48
  • 38. Search as “Berrypicking” Marcia Bates [Bat89] Search is an iterative process formation need cannot be satisfied by a single result se stead: sequence of selections and collection of pieces o formation during search I. Frommholz Evaluating Search Performance Exploring Information Retrieval 31 / 48
  • 39. Information Seeking Searching Context Ingwersen/Järvelin: The Turn [IJ05] Integration of Information Seeking and Retrieval in Context these sources/systems and tools needs to take their joint usability, quality of information and process into account. One may ask what is the contribu- tion of an IR system at the end of a seeking process – over time, over seek- ing tasks, and over seekers. Since the knowledge sources, systems and tools are not used in isolation they should not be designed nor evaluated in t isolation. They affect each other’s utility in context. Fig. 7.2. Nested contexts and evaluation criteria for task-based ISR (extension of Docs Repr DB Request Query Matc h Repr Result A: Recall, precision, efficiency, quality of information/process B: Usability, quality of information/process C: Qualityof info work process/result Work Task Seeking Task Seeking Process Work Process Task Result Seeking Result Evaluation Criteria: Work task context Seeking context IR context Socio-organizational cultural context D:Socio-cognitive relevance; quality of work task result I. Frommholz Evaluating Search Performance Exploring Information Retrieval 32 / 48
  • 40. Interactive IR Research Continuum Diane Kelly [Kel07] 10 What is IIR? System Focus Human Focus TREC-style Studies Information-Seeking Behavior in Context Log Analysis TREC Interactive Studies Experimental Information Behavior Information-Seeking Behavior with IR Systems “Users” make relevance assessments Filtering and SDI Archetypical IIR Study Fig. 2.1 Research continuum for conceptualizing IIR research. search experiences and behaviors, and their interactions with systems. This type of study differs from a pure system-centered study because I. Frommholz Evaluating Search Performance Exploring Information Retrieval 33 / 48
  • 41. Simulated Tasks Pia Borlund [Bor03] • Use of realistic scenarios • Simulated work tasks: Simulated work task situation: After your graduation you will be looking for a job in industry. You want information to help you focus your future job seeking. You know it pays to know the mar- ket. You would like to find some information about employment patterns in industry and what kind of qualifications employers will be looking for from future employees. Indicative request: Find, for instance, something about future employment trends in industry, i.e., areas of growth and decline. • Simulated work tasks should be realistic (e.g., not “imagine you’re the first human on Mars...”) • However, a less relatable task may be outweighed by a topically very interesting situation • Taking into account situational factors (situational relevance) I. Frommholz Evaluating Search Performance Exploring Information Retrieval 34 / 48
  • 42. Living Labs [BS21; JBD18] • Offline evaluation (Cranfield): reproducible, not taking users into account • Online evaluation: Involving live users, lack of reproducibility • Living labs: aims at making online evaluation fully reproducible • TREC OpenSearch, CLEF LL4IR, CLEF NewsREEL, NCTIR-13 OpenLive Q qid - docid rank score identifier - (from [BS21]) I. Frommholz Evaluating Search Performance Exploring Information Retrieval 35 / 48
  • 43. What to Evaluate? • Relevance (retrieval effectiveness) ←− most IR academics focus here • Coverage (e.g. percentage of actual user queries answerable) • Speed (throughput, responsiveness) • User interface quality (UX) • Cost/time to build • Cost to operate • Task completion time • Scalability (order of magnitude of number of docs.) • Memory requirements (transient, persistent) • Index freshness (how long before I can retrieve it) • User’s subjective satisfaction I. Frommholz Evaluating Search Performance Exploring Information Retrieval 36 / 48
  • 44. Table of Contents 1 Evaluation: Why and What? 2 The Cranfield Paradigm 3 Common Evaluation Metrics 4 Selected Evaluation Initiatives 5 Beyond Cranfield 6 The Practitioner View I. Frommholz Evaluating Search Performance Exploring Information Retrieval 37 / 48
  • 45. Challenges to Practitioners • Legacy: old code-base with strange/non-standard/non-effective evaluation metrics in use combined with a resistance to change; • Knowledge gap: lack of skills/expertise/experience in the core team; • Resources: no gold data available; • Planning: evaluation was not budgeted for; • Awareness: team consists of traditional managers and software engineers that do not realise quantitative evaluation is a thing; • Infrastructure: evaluation needs to be done “in vivo” as there is no second system instance available; and • Scaffolding: search component is deeply embedded in the overall system and cannot be run as a batch script. • Standard evaluation metrics do not assess domain peculiarities [LC14] I. Frommholz Evaluating Search Performance Exploring Information Retrieval 38 / 48
  • 46. What is Different in Industry? • Effectiveness is only one of many concerns (sometimes even forgotten!) • Functionality-oriented view (search seen as one of many functions, no or little awareness of effectiveness issues) • Purchase as success criterium! • Search is often (and wrongly) considered a “solved” problem • Frequently held developer sentiment: “Just use Elastic, Solr or Lucene, and we’re done.” • Developers unskilled in IR may integrate libraries with default settings inappropriate for a given use case • Evaluation often done online (on a running system) - A/B testing [KL17] I. Frommholz Evaluating Search Performance Exploring Information Retrieval 39 / 48
  • 47. References and Further Reading • Introduction to Information Retrieval - Evaluation (slides) https://web.stanford.edu/class/cs276/handouts/ EvaluationNew-handout-6-per.pdf • Online Controlled Experiments and A/B Testing https://www.researchgate.net/profile/Ron-Kohavi/publication/ 316116834_Online_Controlled_Experiments_and_AB_Testing • Enterprise Search – Evaluation (Chapter 4 of [KH]) https://www.flax.co.uk/wp-content/uploads/2017/11/ES_book_ final_journal_version.pdf • TREC Conferences https://trec.nist.gov • CLEF Initiative http://www.clef-initiative.eu • NTCIR Workshops http://research.nii.ac.jp/ntcir/data/data-en.html I. Frommholz Evaluating Search Performance Exploring Information Retrieval 40 / 48
  • 48. Thank you! I. Frommholz Evaluating Search Performance Exploring Information Retrieval 41 / 48
  • 49. Bibliography I [Bat89] M J Bates. “The design of browsing and berrypicking techniques for the online search interface”. In: Online Review 13.5 (1989), pp. 407–424. [Bor03] Pia Borlund. “The IIR evaluation model: a framework for evaluation of interactive information retrieval systems”. In: Information Research 8.3 (2003). url: http://informationr.net/ir/8-3/paper152.html. [BS21] Timo Breuer and Philipp Schaer. “A Living Lab Architecture for Reproducible Shared Task Experimentation”. In: Information between Data and Knowledge. Vol. 74. Schriften zur Informationswissenschaft. Session 6: Emerging Technologies. Glückstadt: Werner Hülsbusch, 2021, pp. 348–362. doi: 10.5283/epub.44953. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 42 / 48
  • 50. Bibliography II [BV05] Chris Buckley and Ellen Vorhees. “Retrieval System Evaluation”. In: TREC Exp. Eval. Inf. Retr. Ed. by Ellen Vorhees and Donna Harman. MIT Press, 2005, Chapter 3. [FP19] Nicola Ferro and Carol Peters, eds. Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF. Cham, CH: Springer-Nature, 2019. [Fuh17] Norbert Fuhr. “Some Common Mistakes In IR Evaluation, And How They Can Be Avoided”. In: SIGIR Forum 33 (2017). url: http://www.is.informatik.uni- duisburg.de/bib/pdf/ir/Fuhr_17b.pdf. [Fuh20] Norbert Fuhr. “SIGIR Keynote: Proof By Experimentation? Towards Better IR Research”. In: SIGIR Forum 54.2 (2020), pp. 1–4. url: http://sigir.org/wp- content/uploads/2020/12/p04.pdf. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 43 / 48
  • 51. Bibliography III [Har11] Donna Harman. Information Retrieval Evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services. San Francisco, CA, USA: Morgan Claypool, 2011. doi: https: //doi.org/10.2200/S00368ED1V01Y201105ICR019. [IJ05] Peter Ingwersen and Kalvero Järvelin. The turn: integration of information seeking and retrieval in context. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2005. isbn: 140203850X. [JBD18] Rolf Jagerman, Krisztian Balog, and Maarten De Rijke. “OpenSearch: Lessons learned from an online evaluation campaign”. In: J. Data Inf. Qual. 10.3 (2018). issn: 19361963. doi: 10.1145/3239575. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 44 / 48
  • 52. Bibliography IV [JK02] Kalvero Järvelin and Jaana Kekäläinen. “Cumulated Gain-Based Evaluation of IR Techniques”. In: ACM Trans. Inf. Syst. 20.4 (2002), pp. 422–446. url: https://www.cc. gatech.edu/$%5Csim$zha/CS8803WST/dcg.pdf. [Kel07] Diane Kelly. “Methods for Evaluating Interactive Information Retrieval Systems with Users”. In: Found. Trends Inf. Retr. 3.1—2 (2007), pp. 1–224. issn: 1554-0669. doi: 10.1561/1500000012. url: http: //www.nowpublishers.com/article/Details/INR-012. [KH] Udo Kruschwitz and Charlie Hull. Searching the Enterprise. Foundations and Trends in Information Retrieval. Now Publishers. isbn: 978-1680833041. doi: http://dx.doi.org/10.1561/1500000053. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 45 / 48
  • 53. Bibliography V [KL17] Ron Kohavi and Roger Longbotham. “Online Controlled Experiments and A/B Testing”. In: Encyclopedia of Machine Learning and Data Mining. Ed. by Claude Sammut and Geoffrey I. Webb. Boston, MA, USA: Springer, 2017, pp. 922–929. doi: 10.1007/978-1-4899-7687-1_891. [Lar+17] Martha Larson et al. “The Benchmarking Initiative for Multimedia Evaluation: MediaEval 2016”. In: IEEE Multimed. 24.1 (2017), pp. 93–96. issn: 1070986X. doi: 10.1109/MMUL.2017.9. [LC14] Bo Long and Yi Chang. Relevance Ranking for Vertical Search Engines. Morgan Kaufmann, 2014. isbn: 978-0124071711. [LÖ09] Ling Liu and M. Tamer Özsu. “XML Information Retrieval”. In: Encyclopedia of Database Systems. Ed. by Ling Liu and M. Tamer Özsu. Springer, 2009. doi: https://doi.org/10.1007/978-0-387-39940-9_4084. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 46 / 48
  • 54. Bibliography VI [Miz98] Stefano Mizzaro. “How many relevances in information retrieval?” In: Interact. Comput. 10 (1998), pp. 303–320. [Sak20] Tetsuya Sakai. “On Fuhr’s Guideline for IR Evaluation”. In: SIGIR Forum 54.1 (2020). [SOK21] Tetsuya Sakai, Douglas W. Oard, and Noriko Kando, eds. Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact. The Information Retrieval Series, 43. Heidelberg, Germany: Springer, 2021. isbn: 978-9811555534. [VH05] Ellen M. Voorhees and Donna K. Harman, eds. TREC - Experiment and Evaluation in Information: Experiment and Evaluation in Information Retrieval. Cambridge, MA, USA: MIT Press, 2005. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 47 / 48
  • 55. Bibliography VII [VH99] Ellen M. Voorhees and Donna K. Harman. “The Text REtrieval Conference (TREC): history and plans for TREC-9”. In: SIGIR Forum 33.2 (1999), pp. 12–15. doi: 10.1145/344250.344252. [Voo+20] Ellen Voorhees et al. “TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection”. In: SIGIR Forum 54.1 (2020), pp. 1–12. arXiv: 2005.04474. url: http://arxiv.org/abs/2005.04474. [Wan+20] Lucy Lu Wang et al. “CORD-19: The Covid-19 Open Research Dataset.”. In: ArXiv (2020). issn: 2331-8422. arXiv: 2004.10706. url: http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=PMC7251955. I. Frommholz Evaluating Search Performance Exploring Information Retrieval 48 / 48