Lucene and Solr provide a number of options for query parsing, and these are valuable tools for creating powerful search applications. This presentation given at the 2013 Lucene Revolution will review the role that advanced query parsing can play in building systems, including: Relevancy customization, taking input from user interface variables such as the position on a website or geographical indicators, which sources are to be searched and 3rd party data sources. Query parsing can also enhance data security. Best practices for building and maintaining complex query parsing rules will be discussed and illustrated. Chief Architect Paul Nelson provides this compelling presentation.
Search Technologies provides relevancy tuning services for Solr. For further information, see http://www.searchtechnologies.com/solr-lucene-relevancy.html
http://www.searchtechnologies.com
2. 2
Search Technologies Overview
• Formed June 2005
• Over 100 employees and growing
• Over 400 customers worldwide
• Presence in US, Latin America, UK & Germany
• Deep enterprise search expertise
• Consistent revenue growth and profitability
• Search Engine Independent
3. 3
Lucene Relevancy: Simple Operators
• term(A) TF(A) * IDF(A)
• Implemented with DefaultSimilarity / TermQuery
• TF(A) = sqrt(termInDocCount)
• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
• and(A,B) A * B
• Implemented with BooleanQuery()
• or(A, B) A + B
• Implemented with BooleanQuery()
• max(A, B) max(A, B)
• Implemented with DisjunctionMaxQuery()
3
4. 4
Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
5. 5
Less Used Operators
• boost(f, A) (A * f)
• Implemented with Query.setBoost(f)
• constant(f, A) if(A) then f else 0.0
• Implemented with ConstantScoreQuery()
• boostPlus(A, B) if(A) then (A + B) else 0.0
• Implemented with BooleanQuery()
• boostMul(f, A, B) if(B) then (A * f) else A
• Implemented with BoostingQuery()
5
6. 6
Problem: Need for More Flexibility
• Difficult / impossible to use all operators
• Many not available in standard query parsers
• Complex expressions = string manipulation
• This is messy
• Query construction is in the application layer
• Your UI programmer is creating query expressions?
• Seriously?
• Hard to create and use new operators
• Requires modifying query parsers - yuck
6
8. 8
Introducing: QPL
• Query Processing Language
• Domain Specific Language for Constructing Queries
• Built on Groovy
• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
• Solr Plug-Ins
• Query Parser
• Search Component
• “The 4GL for Text Search Query Expressions”
• Server-side Solr Access
• Cores, Analyzers, Embedded Search, Results XML
8
19. 19
Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
20. 20
Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();
myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
21. 21
Other Features
• Embedded Grouping Queries
• Oh yes they did!
• Proximity operators
• ADJ, NEAR/#, BEFORE/#
• Reverse Lemmatizer
• Prefers exact matches over variants
• Transformer
• Applies transformations recursively to query trees
21