SlideShare a Scribd company logo
1 of 24
Download to read offline
Advanced Relevancy Ranking
Paul Nelson
Chief Architect / Search Technologies
2
Search Technologies Overview
• Formed June 2005
• Over 100 employees and growing
• Over 400 customers worldwide
• Presence in US, Latin America, UK & Germany
• Deep enterprise search expertise
• Consistent revenue growth and profitability
• Search Engine Independent
3
Lucene Relevancy: Simple Operators
• term(A)  TF(A) * IDF(A)
• Implemented with DefaultSimilarity / TermQuery
• TF(A) = sqrt(termInDocCount)
• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
• and(A,B)  A * B
• Implemented with BooleanQuery()
• or(A, B)  A + B
• Implemented with BooleanQuery()
• max(A, B)  max(A, B)
• Implemented with DisjunctionMaxQuery()
3
4
Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
5
Less Used Operators
• boost(f, A)  (A * f)
• Implemented with Query.setBoost(f)
• constant(f, A)  if(A) then f else 0.0
• Implemented with ConstantScoreQuery()
• boostPlus(A, B)  if(A) then (A + B) else 0.0
• Implemented with BooleanQuery()
• boostMul(f, A, B)  if(B) then (A * f) else A
• Implemented with BoostingQuery()
5
6
Problem: Need for More Flexibility
• Difficult / impossible to use all operators
• Many not available in standard query parsers
• Complex expressions = string manipulation
• This is messy
• Query construction is in the application layer
• Your UI programmer is creating query expressions?
• Seriously?
• Hard to create and use new operators
• Requires modifying query parsers - yuck
6
7
Solr
Query Processing Language
7
User
Interface
QPL
Engine
Search
QPL
Script
8
Introducing: QPL
• Query Processing Language
• Domain Specific Language for Constructing Queries
• Built on Groovy
• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
• Solr Plug-Ins
• Query Parser
• Search Component
• “The 4GL for Text Search Query Expressions”
• Server-side Solr Access
• Cores, Analyzers, Embedded Search, Results XML
8
9
Solr Plug-Ins
10
QPL Configuration – solrconfig.xml
<queryParser name="qpl"
class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str>
<str name="defaultField">text</str>
</queryParser>
<searchComponent name="qplSearchFirst"
class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str>
<str name="defaultField">text</str>
<str name="isProcessScript">false</str>
</searchComponent>
Query Parser Configuration:
Search Component Configuration:
11
QPL Example #1
myTerms = solr.tokenize(query);
phraseQ = phrase(myTerms);
andQ = and(myTerms);
return phraseQ^3.0 | andQ^2.0 | orQ;
Tokenize:
Phrase Query:
And Query:
Put It All Together:
orQ = (myTerms.size() <= 2) ? null :
orMin( (myTerms.size()+1)/2, myTerms);
Or Query:
12
Thesaurus Example #2
myTerms = solr.tokenize(query);
thes = Thesaurus.load("thesaurus.xml")
thesQ = thes.expand(0.8f,
solr.tokenizer("text"), myTerms);
return and(thesQ);
Tokenize:
Load Thesaurus: (cached)
Thesaurus Expansion:
Put It All Together:
Original Query: bathroom humor
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
13
More Operators
Boolean Query Parser:
pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:
q1 = boostPlus(query, optionalQ)
q2 = boostMul(0.5, query, optionalQ)
q3 = constant(0.5, query)
Composite Queries:
compQ = and(compositeMax(
["title":1.5, "body":0.8],
"george", "washington"))
14
News Feed Use Case
14
Order Documents Date
1 markets+terms Today
2 markets Today
3 terms Today
4 companies Today
5 markets+terms Yesterday
6 markets Yesterday
7 terms Yesterday
8 companies Yesterday
9 markets, companies older
15
News Feed Use Case – Step 1
markets = split(solr.markets, "s*;s*")
marketsQ = field("markets", or(markets));
terms = solr.tokenize(query);
termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
compIds = split(solr.compIds, "s*;s*")
compIdsQ = field("companyIds", or(compIds))
Segments:
Terms:
Companies:
16
News Feed Use Case – Step 2
todayDate = sdf.format(c.getTime())
todayQ = field("date_s",todayDate)
c.add(Calendar.DAY_OF_MONTH, -1)
yesterdayDate = sdf.format(c.getTime())
yesterdayQ = field("date_s",yesterdayDate)
Today:
Yesterday:
sdf = new SimpleDateFormat("yyyy-MM-dd")
cal = Calendar.getInstance()
17
News Feed Use Case
17
Order Documents Date
1 markets+terms Today
2 markets Today
3 terms Today
4 companies Today
5 markets+terms Yesterday
6 markets Yesterday
7 terms Yesterday
8 companies Yesterday
9 markets, companies older
18
News Feed Use Case – Step 3
sq1 = constant(4.0, and(marketsQ, termsQ))
sq2 = constant(3.0, marketsQ)
sq3 = constant(2.0, termsQ)
sq4 = constant(1.0, compIdsQ)
subjectQ = max(sq1, sq2, sq3, sq4)
tq1 = constant(10.0, todayQ)
tq2 = constant(1.0, yesterdayQ)
timeQ = max(tq1, tq2)
recentQ = and(subjectQ, timeQ)
Weighted Subject Queries:
Weighted Time Queries:
Put it All Together:
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
19
Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
20
Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();
myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
21
Other Features
• Embedded Grouping Queries
• Oh yes they did!
• Proximity operators
• ADJ, NEAR/#, BEFORE/#
• Reverse Lemmatizer
• Prefers exact matches over variants
• Transformer
• Applies transformations recursively to query trees
21
22
Solr
Query Processing Language
22
User
Interface
QPL
Engine
Search
Data as entered
by user Boolean
Query Expression
QPL
Script
Application
Dev Team
Search Team
23
Solr
QPL: Using External Sources to Build Queries
23
User
Interface
QPL
Engine
Search
QPL
Script
RDBMS
Other
Indexes
Thesaurus
CONTACT
Paul Nelson
pnelson@searchtechnologies.com

More Related Content

What's hot

Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerVitor Hirota Makiyama
 
Deletion of an element at a given particular position
Deletion of an  element  at a given particular positionDeletion of an  element  at a given particular position
Deletion of an element at a given particular positionKavya Shree
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
JDD2014: Real life lambdas - Peter Lawrey
JDD2014: Real life lambdas - Peter LawreyJDD2014: Real life lambdas - Peter Lawrey
JDD2014: Real life lambdas - Peter LawreyPROIDEA
 
Educational slides by venay magen
Educational slides by venay magenEducational slides by venay magen
Educational slides by venay magenvenaymagen19
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Problem details
Problem detailsProblem details
Problem detailsMohsen B
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMorgan Tocker
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10genMongoDB
 
The World Cup Graph 2018
The World Cup Graph 2018The World Cup Graph 2018
The World Cup Graph 2018Neo4j
 
L'ingénierie dans les nuages
L'ingénierie dans les nuagesL'ingénierie dans les nuages
L'ingénierie dans les nuagesAndrew Forward
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
Review of some successes
Review of some successesReview of some successes
Review of some successesAndrea Zaliani
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 

What's hot (17)

HDF Explorer
HDF ExplorerHDF Explorer
HDF Explorer
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
 
Deletion of an element at a given particular position
Deletion of an  element  at a given particular positionDeletion of an  element  at a given particular position
Deletion of an element at a given particular position
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
JDD2014: Real life lambdas - Peter Lawrey
JDD2014: Real life lambdas - Peter LawreyJDD2014: Real life lambdas - Peter Lawrey
JDD2014: Real life lambdas - Peter Lawrey
 
Educational slides by venay magen
Educational slides by venay magenEducational slides by venay magen
Educational slides by venay magen
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Problem details
Problem detailsProblem details
Problem details
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
 
The World Cup Graph 2018
The World Cup Graph 2018The World Cup Graph 2018
The World Cup Graph 2018
 
L'ingénierie dans les nuages
L'ingénierie dans les nuagesL'ingénierie dans les nuages
L'ingénierie dans les nuages
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
Review of some successes
Review of some successesReview of some successes
Review of some successes
 
Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 

Viewers also liked

The things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchThe things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchSearch Technologies
 
Advanced Query Parsing Techniques
Advanced Query Parsing TechniquesAdvanced Query Parsing Techniques
Advanced Query Parsing TechniquesSearch Technologies
 
Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Search Technologies
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big DataSearch Technologies
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchSearch Technologies
 

Viewers also liked (6)

The things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 SearchThe things you need to know about SharePoint 2013 Search
The things you need to know about SharePoint 2013 Search
 
Advanced Query Parsing Techniques
Advanced Query Parsing TechniquesAdvanced Query Parsing Techniques
Advanced Query Parsing Techniques
 
Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013Enterprise Search Best Practices Webinar 4.2013
Enterprise Search Best Practices Webinar 4.2013
 
The Evolution of Search and Big Data
The Evolution of Search and Big DataThe Evolution of Search and Big Data
The Evolution of Search and Big Data
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
Enterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for SearchEnterprise Search Summit Keynote: A Big Data Architecture for Search
Enterprise Search Summit Keynote: A Big Data Architecture for Search
 

Similar to Advanced Relevancy Ranking

Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects ExplainedAtul Gupta(8X)
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享Chengjen Lee
 
SQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersSQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersJerod Johnson
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderKadharBashaJ
 
Oracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsOracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsSergei Martens
 
SFDC Advanced Apex
SFDC Advanced Apex SFDC Advanced Apex
SFDC Advanced Apex Sujit Kumar
 
Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 ReleaseJyothylakshmy P.U
 
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Mikael Svenson
 
Nu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comNu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comDay Software
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
 

Similar to Advanced Relevancy Ranking (20)

Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained#SalesforceSaturday : Salesforce BIG Objects Explained
#SalesforceSaturday : Salesforce BIG Objects Explained
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
React inter3
React inter3React inter3
React inter3
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
SQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API ConsumersSQL for Web APIs - Simplifying Data Access for API Consumers
SQL for Web APIs - Simplifying Data Access for API Consumers
 
Alternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builderAlternate for scheduled apex using flow builder
Alternate for scheduled apex using flow builder
 
Oracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google AppsOracle Application Express as add-on for Google Apps
Oracle Application Express as add-on for Google Apps
 
70433 Dumps DB
70433 Dumps DB70433 Dumps DB
70433 Dumps DB
 
Javascript
JavascriptJavascript
Javascript
 
SFDC Advanced Apex
SFDC Advanced Apex SFDC Advanced Apex
SFDC Advanced Apex
 
Salesforce Summer 14 Release
Salesforce Summer 14 ReleaseSalesforce Summer 14 Release
Salesforce Summer 14 Release
 
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
Search Queries Explained – A Deep Dive into Query Rules, Query Variables and ...
 
Nu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.comNu Skin: Integrating the Day CMS with Translation.com
Nu Skin: Integrating the Day CMS with Translation.com
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 

Recently uploaded

Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 

Recently uploaded (20)

Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 

Advanced Relevancy Ranking

  • 1. Advanced Relevancy Ranking Paul Nelson Chief Architect / Search Technologies
  • 2. 2 Search Technologies Overview • Formed June 2005 • Over 100 employees and growing • Over 400 customers worldwide • Presence in US, Latin America, UK & Germany • Deep enterprise search expertise • Consistent revenue growth and profitability • Search Engine Independent
  • 3. 3 Lucene Relevancy: Simple Operators • term(A)  TF(A) * IDF(A) • Implemented with DefaultSimilarity / TermQuery • TF(A) = sqrt(termInDocCount) • IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0 • and(A,B)  A * B • Implemented with BooleanQuery() • or(A, B)  A + B • Implemented with BooleanQuery() • max(A, B)  max(A, B) • Implemented with DisjunctionMaxQuery() 3
  • 4. 4 Simple Operators - Example and or max george martha washington custis 0.10 0.20 0.60 0.90 0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90 0.3 * 0.9 = 0.27
  • 5. 5 Less Used Operators • boost(f, A)  (A * f) • Implemented with Query.setBoost(f) • constant(f, A)  if(A) then f else 0.0 • Implemented with ConstantScoreQuery() • boostPlus(A, B)  if(A) then (A + B) else 0.0 • Implemented with BooleanQuery() • boostMul(f, A, B)  if(B) then (A * f) else A • Implemented with BoostingQuery() 5
  • 6. 6 Problem: Need for More Flexibility • Difficult / impossible to use all operators • Many not available in standard query parsers • Complex expressions = string manipulation • This is messy • Query construction is in the application layer • Your UI programmer is creating query expressions? • Seriously? • Hard to create and use new operators • Requires modifying query parsers - yuck 6
  • 8. 8 Introducing: QPL • Query Processing Language • Domain Specific Language for Constructing Queries • Built on Groovy • https://wiki.searchtechnologies.com/index.php/QPL_Home_Page • Solr Plug-Ins • Query Parser • Search Component • “The 4GL for Text Search Query Expressions” • Server-side Solr Access • Cores, Analyzers, Embedded Search, Results XML 8
  • 10. 10 QPL Configuration – solrconfig.xml <queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str> </queryParser> <searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str> </searchComponent> Query Parser Configuration: Search Component Configuration:
  • 11. 11 QPL Example #1 myTerms = solr.tokenize(query); phraseQ = phrase(myTerms); andQ = and(myTerms); return phraseQ^3.0 | andQ^2.0 | orQ; Tokenize: Phrase Query: And Query: Put It All Together: orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms); Or Query:
  • 12. 12 Thesaurus Example #2 myTerms = solr.tokenize(query); thes = Thesaurus.load("thesaurus.xml") thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms); return and(thesQ); Tokenize: Load Thesaurus: (cached) Thesaurus Expansion: Put It All Together: Original Query: bathroom humor [or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
  • 13. 13 More Operators Boolean Query Parser: pQ = parseQuery("(george or martha) near/5 washington") Relevancy Ranking Operators: q1 = boostPlus(query, optionalQ) q2 = boostMul(0.5, query, optionalQ) q3 = constant(0.5, query) Composite Queries: compQ = and(compositeMax( ["title":1.5, "body":0.8], "george", "washington"))
  • 14. 14 News Feed Use Case 14 Order Documents Date 1 markets+terms Today 2 markets Today 3 terms Today 4 companies Today 5 markets+terms Yesterday 6 markets Yesterday 7 terms Yesterday 8 companies Yesterday 9 markets, companies older
  • 15. 15 News Feed Use Case – Step 1 markets = split(solr.markets, "s*;s*") marketsQ = field("markets", or(markets)); terms = solr.tokenize(query); termsQ = field("body", or(thesaurus.expand(0.9f, terms))) compIds = split(solr.compIds, "s*;s*") compIdsQ = field("companyIds", or(compIds)) Segments: Terms: Companies:
  • 16. 16 News Feed Use Case – Step 2 todayDate = sdf.format(c.getTime()) todayQ = field("date_s",todayDate) c.add(Calendar.DAY_OF_MONTH, -1) yesterdayDate = sdf.format(c.getTime()) yesterdayQ = field("date_s",yesterdayDate) Today: Yesterday: sdf = new SimpleDateFormat("yyyy-MM-dd") cal = Calendar.getInstance()
  • 17. 17 News Feed Use Case 17 Order Documents Date 1 markets+terms Today 2 markets Today 3 terms Today 4 companies Today 5 markets+terms Yesterday 6 markets Yesterday 7 terms Yesterday 8 companies Yesterday 9 markets, companies older
  • 18. 18 News Feed Use Case – Step 3 sq1 = constant(4.0, and(marketsQ, termsQ)) sq2 = constant(3.0, marketsQ) sq3 = constant(2.0, termsQ) sq4 = constant(1.0, compIdsQ) subjectQ = max(sq1, sq2, sq3, sq4) tq1 = constant(10.0, todayQ) tq2 = constant(1.0, yesterdayQ) timeQ = max(tq1, tq2) recentQ = and(subjectQ, timeQ) Weighted Subject Queries: Weighted Time Queries: Put it All Together: return max(recentQ, or(marketsQ,compIdsQ)^0.01))
  • 19. 19 Embedded Search Example #1 results = solr.search('subjectsCore', or(qTerms), 50) subjectsQ = or(results*.subjectId) return field("title", and(qTerms)) | subjectsQ^0.9; Execute an Embedded Search: Create a query from the results: Put it all together: qTerms = solr.tokenize(qTerms);
  • 20. 20 Embedded Search Example #2 results = solr.search('categories', and(qTerms), 10) myList = solr.newList(); myList.add("relatedCategories", results*.title); solr.addResponse(myList) Execute an Embedded Search: Create a Solr named list: Add it to the XML response: qTerms = solr.tokenize(qTerms);
  • 21. 21 Other Features • Embedded Grouping Queries • Oh yes they did! • Proximity operators • ADJ, NEAR/#, BEFORE/# • Reverse Lemmatizer • Prefers exact matches over variants • Transformer • Applies transformations recursively to query trees 21
  • 22. 22 Solr Query Processing Language 22 User Interface QPL Engine Search Data as entered by user Boolean Query Expression QPL Script Application Dev Team Search Team
  • 23. 23 Solr QPL: Using External Sources to Build Queries 23 User Interface QPL Engine Search QPL Script RDBMS Other Indexes Thesaurus