SlideShare a Scribd company logo
1 of 61
Search Intelligence &
MarkLogic Search API
MarkLogic World 2012
Will Thompson
wthompson@jonesmcclure.com
Search API Resources
• 5-minute Guide to the Search API
• MarkLogic Search Developer's Guide
• developer.marklogic.com
• MarkMail.org
• MarkLogic Developer Listserv
Code
Github:
https://github.com/wthoolihan/MLUC-2012-Examples
Search Intelligence
Search Intelligence
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 3: Facets
Search Intelligence
• Get the most out of our XML in search
– Approach 3:
Facets, constraints, filters
Search Intelligence
• Get the most out of our XML in search
– Infer (Search Intelligence)
Enrich Your Query!
• Infer
– Use knowledge about the user
– Look for meaning in search terms
• Enrich
– Translate into more complex query
– Gain speed, accuracy
Enrich Your Query!
• Strategies
– Custom term handling
• Works well for single term transformations
• See: http://developer.marklogic.com/try/ninja/page13
– Roll your own parser
• A lot of work (see Michael Blakeley’s xqysp)
– Work between parse and search steps
Search API Overview
• The Search API is an XQuery library module designed to
simplify creating search applications:
o Parser
o Constraints
o Faceting
o Snippets
• High performance, scalability
• Extensible
Search API Extensibility
• Search API provides several points to hook in
• Hooks are defined in Search API options XML node
o Custom constraints
o Custom grammar
o Custom snippets
o Custom term handling
o Search operators
Search API Basics
• Search API module:
• Main entry point: search:search()
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
• parses $qtext with given $options
• executes search
• returns <search:response>
o set of <search:result>s
o facets
o snippets
o metrics and other info
Search API Basics
• Search API
options:
Search API Extensibility
• Snippet:
• Constraint:
Search API Extensibility
• Term handler:
• Parser:
let $custom-parser-output :=
my:parse($qtext)
search:resolve(
$custom-parser-output,
$options
)
Search API Basics
• Search API parser:
• Execute search:
• 1st half of search:search()
• returns annotated cts:query XML
• 2nd half of search:search()
• accepts cts:query XML as input
search:parse() Strategy
1. Call search:parse()
2. Analyze and enrich the query XML
3. Call search:resolve()
Our Use Case
• O’Connor’s Online
– Search portal built on MarkLogic
– Legal rules and commentaries content
– Problem
• Users will enter citation numbers, abbreviations, etc. expecting
complete results
• Text editorial content follows different conventions
– Solution
• Detect special cases pre-search and enrich query
Example: detect year
• Content:
– MarkLogic database of news/op-ed articles
• Organized into year directories:
/content/1990
/content/1991
/content/1992
...
/content/2012
• Year is in directory structure, not article text
– But users will still include year in search terms
How to transform query?
• Recursive typeswitch
(function mapping on):
do-stuff-here($q)
Example: detect year
Example: detect year
let $terms := "1996 United States Olympics"
return local:detect-year(search:parse($terms))
Example: detect year
• Strategy depends on your content model
• Other possibilities
– date detection
– date ranges
– locations
– etc.
search:parse() Strategy
• Weakness
– Limited to single word token
• Similar to custom term handling
• What about multiple tokens?
– Analyze querystring text directly using regex
• Dangerous
– Transform cts:query XML into intermediate form
• Preserve Boolean logic & grouping
• Preserve phrases
• Preserve constraints
Building Intermediate Query
• The hack
– Basically, undoing some of the parser's work
– Text "run" concept
• Similar to WordprocessingML
Building Intermediate Query
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
• Content:
– Same MarkLogic database of news/op-ed articles from
detect-year() example
• Query:
– Same as before: "1996 United States Olypmics"
– Start with the search:parse()output
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
1. Flatten query
– remove implicit and-queries from search:parse() output:
1. Flatten query
– XML should look more like cts:query string
representation:
Example: multi-word thesaurus
cts:and-query(
(cts:word-query("1996", "lang=en", 1),
cts:word-query("United", "lang=en", 1),
cts:word-query("States", "lang=en", 1),
cts:word-query("Olympics", "lang=en", 1)),
())
1. Flatten query
• Typeswitch on
cts:and-query:
1. Check and-queries for
parent and-query
2. Remove the nested
ones, copy through
anything else
Example: multi-word thesaurus
Example: multi-word thesaurus
1. Flatten query
– Typeswitch function output:
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Typeswitch on cts:word-query:
1. Ignore phrases
2. Delete if query is
not the first.
3. Take first
word-query in
sequence and
join with its
following siblings
into a <run>
2. Join sibling words in <run>:
• Input:
– search:parse("1996 United States Olympics")/local:unnest-
ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Input:
– search:parse("1996 (sprint OR marathon) United States
Olympics")/local:unnest-ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
3. Transform <run>s:
1. Store terms in thesaurus
2. Build cts:or-query of thesaurus terms
3. Using cts:or-query of terms, cts:highlight() <run>s,
and replace with thesaurus synonyms
3. Transform <run>s:
1. store terms in
thesaurus
Example: multi-word thesaurus
3. Transform <run>s:
2. build cts:or-query of thesaurus terms:
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
– cts:highlight() - powerful cts:query-based find/replace
»
»
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
Example: multi-word thesaurus
3. Transform <run>s:
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $q-runs :=
search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
return local:thsr-expand($runs, $q-thsr)
3. Transform <run>s:
Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
4. Convert <run>s back to word queries
– Typeswitch:
Example: multi-word thesaurus
4. Convert <run>s back to word queries
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $runs := search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
let $expanded := local:thsr-expand($runs, $q-thsr)
return local:resolve-runs($expanded)
4. Convert <run>s back to word queries
Output:
Example: multi-word thesaurus
Combining Examples
local:thsr-expand-runs($runs, $q-thsr)
/local:resolve-runs($expanded)/local:detect-year($runs)
Enrich Your Query!
• Takeaway
1. No added GUI
2. Didn't ask the user for additional input
3. Able to build more robust query before
executing search
• Many potential applications:
– Ad-hoc weighting:
Search API Hacking
local:q-add-weights(
search:parse("bananas"),
(<element ns="$ns" name="p" weight="1"/>,
<element ns="$ns" name="b" weight="2"/>,
<element ns="$ns" name="title" weight="3.5"/>)
)
• Many potential applications:
– Automatic spell correction:
Search API Hacking
• Many potential applications:
– Detect entities
• Transform text into element-based query
• Less false positives and exclusions
• Leverage indexes:
Search API Hacking
"New York Times"
Search API Hacking
• Other ideas
– Regex unparsed query string
• apply constraints, operators, etc as configured in Search API based on key
words/patterns
– Custom term handler
• single-term transformations
– Combine with data enrichment on ingestion
• MarkLogic Entity Framework
• Linguistic processing
Hazards
• Chaos
– Daisy chained transformations can have unintended
consequences
– Performance
• Pre-search transformations need to be fast
• make sure to leverage indexes as much as possible
• Larger queries do take longer
Questions

More Related Content

Similar to Search Intelligence & MarkLogic Search API

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0Alex Sumner
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniJitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research ReportAlex Sumner
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 

Similar to Search Intelligence & MarkLogic Search API (20)

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart Webinar
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 

Recently uploaded

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

Search Intelligence & MarkLogic Search API

  • 1. Search Intelligence & MarkLogic Search API MarkLogic World 2012 Will Thompson wthompson@jonesmcclure.com
  • 2. Search API Resources • 5-minute Guide to the Search API • MarkLogic Search Developer's Guide • developer.marklogic.com • MarkMail.org • MarkLogic Developer Listserv
  • 6. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 7. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 8. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 9. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 10. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets
  • 11. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets, constraints, filters
  • 12. Search Intelligence • Get the most out of our XML in search – Infer (Search Intelligence)
  • 13. Enrich Your Query! • Infer – Use knowledge about the user – Look for meaning in search terms • Enrich – Translate into more complex query – Gain speed, accuracy
  • 14. Enrich Your Query! • Strategies – Custom term handling • Works well for single term transformations • See: http://developer.marklogic.com/try/ninja/page13 – Roll your own parser • A lot of work (see Michael Blakeley’s xqysp) – Work between parse and search steps
  • 15. Search API Overview • The Search API is an XQuery library module designed to simplify creating search applications: o Parser o Constraints o Faceting o Snippets • High performance, scalability • Extensible
  • 16. Search API Extensibility • Search API provides several points to hook in • Hooks are defined in Search API options XML node o Custom constraints o Custom grammar o Custom snippets o Custom term handling o Search operators
  • 17. Search API Basics • Search API module: • Main entry point: search:search() import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; • parses $qtext with given $options • executes search • returns <search:response> o set of <search:result>s o facets o snippets o metrics and other info
  • 18. Search API Basics • Search API options:
  • 19. Search API Extensibility • Snippet: • Constraint:
  • 20. Search API Extensibility • Term handler: • Parser: let $custom-parser-output := my:parse($qtext) search:resolve( $custom-parser-output, $options )
  • 21. Search API Basics • Search API parser: • Execute search: • 1st half of search:search() • returns annotated cts:query XML • 2nd half of search:search() • accepts cts:query XML as input
  • 22. search:parse() Strategy 1. Call search:parse() 2. Analyze and enrich the query XML 3. Call search:resolve()
  • 23. Our Use Case • O’Connor’s Online – Search portal built on MarkLogic – Legal rules and commentaries content – Problem • Users will enter citation numbers, abbreviations, etc. expecting complete results • Text editorial content follows different conventions – Solution • Detect special cases pre-search and enrich query
  • 24. Example: detect year • Content: – MarkLogic database of news/op-ed articles • Organized into year directories: /content/1990 /content/1991 /content/1992 ... /content/2012 • Year is in directory structure, not article text – But users will still include year in search terms
  • 25. How to transform query? • Recursive typeswitch (function mapping on): do-stuff-here($q)
  • 27. Example: detect year let $terms := "1996 United States Olympics" return local:detect-year(search:parse($terms))
  • 28. Example: detect year • Strategy depends on your content model • Other possibilities – date detection – date ranges – locations – etc.
  • 29. search:parse() Strategy • Weakness – Limited to single word token • Similar to custom term handling • What about multiple tokens? – Analyze querystring text directly using regex • Dangerous – Transform cts:query XML into intermediate form • Preserve Boolean logic & grouping • Preserve phrases • Preserve constraints
  • 30. Building Intermediate Query • The hack – Basically, undoing some of the parser's work – Text "run" concept • Similar to WordprocessingML
  • 31. Building Intermediate Query • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 32. Example: multi-word thesaurus • Content: – Same MarkLogic database of news/op-ed articles from detect-year() example • Query: – Same as before: "1996 United States Olypmics" – Start with the search:parse()output
  • 33. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 34. Example: multi-word thesaurus 1. Flatten query – remove implicit and-queries from search:parse() output:
  • 35. 1. Flatten query – XML should look more like cts:query string representation: Example: multi-word thesaurus cts:and-query( (cts:word-query("1996", "lang=en", 1), cts:word-query("United", "lang=en", 1), cts:word-query("States", "lang=en", 1), cts:word-query("Olympics", "lang=en", 1)), ())
  • 36. 1. Flatten query • Typeswitch on cts:and-query: 1. Check and-queries for parent and-query 2. Remove the nested ones, copy through anything else Example: multi-word thesaurus
  • 37. Example: multi-word thesaurus 1. Flatten query – Typeswitch function output:
  • 38. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 39. Example: multi-word thesaurus 2. Join sibling words in <run>: • Typeswitch on cts:word-query: 1. Ignore phrases 2. Delete if query is not the first. 3. Take first word-query in sequence and join with its following siblings into a <run>
  • 40. 2. Join sibling words in <run>: • Input: – search:parse("1996 United States Olympics")/local:unnest- ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 41. 2. Join sibling words in <run>: • Input: – search:parse("1996 (sprint OR marathon) United States Olympics")/local:unnest-ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 42. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 43. Example: multi-word thesaurus 3. Transform <run>s: 1. Store terms in thesaurus 2. Build cts:or-query of thesaurus terms 3. Using cts:or-query of terms, cts:highlight() <run>s, and replace with thesaurus synonyms
  • 44. 3. Transform <run>s: 1. store terms in thesaurus Example: multi-word thesaurus
  • 45. 3. Transform <run>s: 2. build cts:or-query of thesaurus terms: Example: multi-word thesaurus
  • 46. 3. Transform <run>s: 3. replace matches with synonyms: – cts:highlight() - powerful cts:query-based find/replace » » Example: multi-word thesaurus
  • 47. 3. Transform <run>s: 3. replace matches with synonyms: Example: multi-word thesaurus
  • 48. 3. Transform <run>s: Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $q-runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) return local:thsr-expand($runs, $q-thsr)
  • 50. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 51. 4. Convert <run>s back to word queries – Typeswitch: Example: multi-word thesaurus
  • 52. 4. Convert <run>s back to word queries Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) let $expanded := local:thsr-expand($runs, $q-thsr) return local:resolve-runs($expanded)
  • 53. 4. Convert <run>s back to word queries Output: Example: multi-word thesaurus
  • 55. Enrich Your Query! • Takeaway 1. No added GUI 2. Didn't ask the user for additional input 3. Able to build more robust query before executing search
  • 56. • Many potential applications: – Ad-hoc weighting: Search API Hacking local:q-add-weights( search:parse("bananas"), (<element ns="$ns" name="p" weight="1"/>, <element ns="$ns" name="b" weight="2"/>, <element ns="$ns" name="title" weight="3.5"/>) )
  • 57. • Many potential applications: – Automatic spell correction: Search API Hacking
  • 58. • Many potential applications: – Detect entities • Transform text into element-based query • Less false positives and exclusions • Leverage indexes: Search API Hacking "New York Times"
  • 59. Search API Hacking • Other ideas – Regex unparsed query string • apply constraints, operators, etc as configured in Search API based on key words/patterns – Custom term handler • single-term transformations – Combine with data enrichment on ingestion • MarkLogic Entity Framework • Linguistic processing
  • 60. Hazards • Chaos – Daisy chained transformations can have unintended consequences – Performance • Pre-search transformations need to be fast • make sure to leverage indexes as much as possible • Larger queries do take longer