SlideShare a Scribd company logo
1 of 61
Search Intelligence &
MarkLogic Search API
MarkLogic World 2012
Will Thompson
wthompson@jonesmcclure.com
Search API Resources
• 5-minute Guide to the Search API
• MarkLogic Search Developer's Guide
• developer.marklogic.com
• MarkMail.org
• MarkLogic Developer Listserv
Code
Github:
https://github.com/wthoolihan/MLUC-2012-Examples
Search Intelligence
Search Intelligence
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 1: GUI
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 2: Syntax
Search Intelligence
• Get the most out of our XML in search
– Approach 3: Facets
Search Intelligence
• Get the most out of our XML in search
– Approach 3:
Facets, constraints, filters
Search Intelligence
• Get the most out of our XML in search
– Infer (Search Intelligence)
Enrich Your Query!
• Infer
– Use knowledge about the user
– Look for meaning in search terms
• Enrich
– Translate into more complex query
– Gain speed, accuracy
Enrich Your Query!
• Strategies
– Custom term handling
• Works well for single term transformations
• See: http://developer.marklogic.com/try/ninja/page13
– Roll your own parser
• A lot of work (see Michael Blakeley’s xqysp)
– Work between parse and search steps
Search API Overview
• The Search API is an XQuery library module designed to
simplify creating search applications:
o Parser
o Constraints
o Faceting
o Snippets
• High performance, scalability
• Extensible
Search API Extensibility
• Search API provides several points to hook in
• Hooks are defined in Search API options XML node
o Custom constraints
o Custom grammar
o Custom snippets
o Custom term handling
o Search operators
Search API Basics
• Search API module:
• Main entry point: search:search()
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
• parses $qtext with given $options
• executes search
• returns <search:response>
o set of <search:result>s
o facets
o snippets
o metrics and other info
Search API Basics
• Search API
options:
Search API Extensibility
• Snippet:
• Constraint:
Search API Extensibility
• Term handler:
• Parser:
let $custom-parser-output :=
my:parse($qtext)
search:resolve(
$custom-parser-output,
$options
)
Search API Basics
• Search API parser:
• Execute search:
• 1st half of search:search()
• returns annotated cts:query XML
• 2nd half of search:search()
• accepts cts:query XML as input
search:parse() Strategy
1. Call search:parse()
2. Analyze and enrich the query XML
3. Call search:resolve()
Our Use Case
• O’Connor’s Online
– Search portal built on MarkLogic
– Legal rules and commentaries content
– Problem
• Users will enter citation numbers, abbreviations, etc. expecting
complete results
• Text editorial content follows different conventions
– Solution
• Detect special cases pre-search and enrich query
Example: detect year
• Content:
– MarkLogic database of news/op-ed articles
• Organized into year directories:
/content/1990
/content/1991
/content/1992
...
/content/2012
• Year is in directory structure, not article text
– But users will still include year in search terms
How to transform query?
• Recursive typeswitch
(function mapping on):
do-stuff-here($q)
Example: detect year
Example: detect year
let $terms := "1996 United States Olympics"
return local:detect-year(search:parse($terms))
Example: detect year
• Strategy depends on your content model
• Other possibilities
– date detection
– date ranges
– locations
– etc.
search:parse() Strategy
• Weakness
– Limited to single word token
• Similar to custom term handling
• What about multiple tokens?
– Analyze querystring text directly using regex
• Dangerous
– Transform cts:query XML into intermediate form
• Preserve Boolean logic & grouping
• Preserve phrases
• Preserve constraints
Building Intermediate Query
• The hack
– Basically, undoing some of the parser's work
– Text "run" concept
• Similar to WordprocessingML
Building Intermediate Query
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
• Content:
– Same MarkLogic database of news/op-ed articles from
detect-year() example
• Query:
– Same as before: "1996 United States Olypmics"
– Start with the search:parse()output
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
1. Flatten query
– remove implicit and-queries from search:parse() output:
1. Flatten query
– XML should look more like cts:query string
representation:
Example: multi-word thesaurus
cts:and-query(
(cts:word-query("1996", "lang=en", 1),
cts:word-query("United", "lang=en", 1),
cts:word-query("States", "lang=en", 1),
cts:word-query("Olympics", "lang=en", 1)),
())
1. Flatten query
• Typeswitch on
cts:and-query:
1. Check and-queries for
parent and-query
2. Remove the nested
ones, copy through
anything else
Example: multi-word thesaurus
Example: multi-word thesaurus
1. Flatten query
– Typeswitch function output:
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Typeswitch on cts:word-query:
1. Ignore phrases
2. Delete if query is
not the first.
3. Take first
word-query in
sequence and
join with its
following siblings
into a <run>
2. Join sibling words in <run>:
• Input:
– search:parse("1996 United States Olympics")/local:unnest-
ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
2. Join sibling words in <run>:
• Input:
– search:parse("1996 (sprint OR marathon) United States
Olympics")/local:unnest-ands(.)/local:create-runs(.)
• Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
Example: multi-word thesaurus
3. Transform <run>s:
1. Store terms in thesaurus
2. Build cts:or-query of thesaurus terms
3. Using cts:or-query of terms, cts:highlight() <run>s,
and replace with thesaurus synonyms
3. Transform <run>s:
1. store terms in
thesaurus
Example: multi-word thesaurus
3. Transform <run>s:
2. build cts:or-query of thesaurus terms:
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
– cts:highlight() - powerful cts:query-based find/replace
»
»
Example: multi-word thesaurus
3. Transform <run>s:
3. replace matches with synonyms:
Example: multi-word thesaurus
3. Transform <run>s:
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $q-runs :=
search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
return local:thsr-expand($runs, $q-thsr)
3. Transform <run>s:
Output:
Example: multi-word thesaurus
Example: multi-word thesaurus
• Intermediate query strategy
1. Flatten query
2. Join sibling words in <run>
3. Transform <run>s
4. Convert <run>s back to word queries
4. Convert <run>s back to word queries
– Typeswitch:
Example: multi-word thesaurus
4. Convert <run>s back to word queries
Input:
Example: multi-word thesaurus
let $q-thsr :=
cts:or-query(
doc("thesaurus.xml")
//thsr:entry/thsr:term/cts:word-query(string(.)))
)
let $runs := search:parse("1996 United States Olympics")
/local:unnest-ands(.)/local:create-runs(.)
let $expanded := local:thsr-expand($runs, $q-thsr)
return local:resolve-runs($expanded)
4. Convert <run>s back to word queries
Output:
Example: multi-word thesaurus
Combining Examples
local:thsr-expand-runs($runs, $q-thsr)
/local:resolve-runs($expanded)/local:detect-year($runs)
Enrich Your Query!
• Takeaway
1. No added GUI
2. Didn't ask the user for additional input
3. Able to build more robust query before
executing search
• Many potential applications:
– Ad-hoc weighting:
Search API Hacking
local:q-add-weights(
search:parse("bananas"),
(<element ns="$ns" name="p" weight="1"/>,
<element ns="$ns" name="b" weight="2"/>,
<element ns="$ns" name="title" weight="3.5"/>)
)
• Many potential applications:
– Automatic spell correction:
Search API Hacking
• Many potential applications:
– Detect entities
• Transform text into element-based query
• Less false positives and exclusions
• Leverage indexes:
Search API Hacking
"New York Times"
Search API Hacking
• Other ideas
– Regex unparsed query string
• apply constraints, operators, etc as configured in Search API based on key
words/patterns
– Custom term handler
• single-term transformations
– Combine with data enrichment on ingestion
• MarkLogic Entity Framework
• Linguistic processing
Hazards
• Chaos
– Daisy chained transformations can have unintended
consequences
– Performance
• Pre-search transformations need to be fast
• make sure to leverage indexes as much as possible
• Larger queries do take longer
Questions

More Related Content

Similar to Search Intelligence & MarkLogic Search API

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0Alex Sumner
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniJitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research ReportAlex Sumner
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 

Similar to Search Intelligence & MarkLogic Search API (20)

SURE_2014 Poster 2.0
SURE_2014 Poster 2.0SURE_2014 Poster 2.0
SURE_2014 Poster 2.0
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra SoniSiteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
Siteocre Sxa and Solr - Sitecore User Group UAE Dubai- Jitendra Soni
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
SURE Research Report
SURE Research ReportSURE Research Report
SURE Research Report
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Sumo Logic QuickStart Webinar
Sumo Logic QuickStart WebinarSumo Logic QuickStart Webinar
Sumo Logic QuickStart Webinar
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic QuickStart Webinar - Jan 2016
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 

Recently uploaded

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Search Intelligence & MarkLogic Search API

  • 1. Search Intelligence & MarkLogic Search API MarkLogic World 2012 Will Thompson wthompson@jonesmcclure.com
  • 2. Search API Resources • 5-minute Guide to the Search API • MarkLogic Search Developer's Guide • developer.marklogic.com • MarkMail.org • MarkLogic Developer Listserv
  • 6. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 7. Search Intelligence • Get the most out of our XML in search – Approach 1: GUI
  • 8. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 9. Search Intelligence • Get the most out of our XML in search – Approach 2: Syntax
  • 10. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets
  • 11. Search Intelligence • Get the most out of our XML in search – Approach 3: Facets, constraints, filters
  • 12. Search Intelligence • Get the most out of our XML in search – Infer (Search Intelligence)
  • 13. Enrich Your Query! • Infer – Use knowledge about the user – Look for meaning in search terms • Enrich – Translate into more complex query – Gain speed, accuracy
  • 14. Enrich Your Query! • Strategies – Custom term handling • Works well for single term transformations • See: http://developer.marklogic.com/try/ninja/page13 – Roll your own parser • A lot of work (see Michael Blakeley’s xqysp) – Work between parse and search steps
  • 15. Search API Overview • The Search API is an XQuery library module designed to simplify creating search applications: o Parser o Constraints o Faceting o Snippets • High performance, scalability • Extensible
  • 16. Search API Extensibility • Search API provides several points to hook in • Hooks are defined in Search API options XML node o Custom constraints o Custom grammar o Custom snippets o Custom term handling o Search operators
  • 17. Search API Basics • Search API module: • Main entry point: search:search() import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; • parses $qtext with given $options • executes search • returns <search:response> o set of <search:result>s o facets o snippets o metrics and other info
  • 18. Search API Basics • Search API options:
  • 19. Search API Extensibility • Snippet: • Constraint:
  • 20. Search API Extensibility • Term handler: • Parser: let $custom-parser-output := my:parse($qtext) search:resolve( $custom-parser-output, $options )
  • 21. Search API Basics • Search API parser: • Execute search: • 1st half of search:search() • returns annotated cts:query XML • 2nd half of search:search() • accepts cts:query XML as input
  • 22. search:parse() Strategy 1. Call search:parse() 2. Analyze and enrich the query XML 3. Call search:resolve()
  • 23. Our Use Case • O’Connor’s Online – Search portal built on MarkLogic – Legal rules and commentaries content – Problem • Users will enter citation numbers, abbreviations, etc. expecting complete results • Text editorial content follows different conventions – Solution • Detect special cases pre-search and enrich query
  • 24. Example: detect year • Content: – MarkLogic database of news/op-ed articles • Organized into year directories: /content/1990 /content/1991 /content/1992 ... /content/2012 • Year is in directory structure, not article text – But users will still include year in search terms
  • 25. How to transform query? • Recursive typeswitch (function mapping on): do-stuff-here($q)
  • 27. Example: detect year let $terms := "1996 United States Olympics" return local:detect-year(search:parse($terms))
  • 28. Example: detect year • Strategy depends on your content model • Other possibilities – date detection – date ranges – locations – etc.
  • 29. search:parse() Strategy • Weakness – Limited to single word token • Similar to custom term handling • What about multiple tokens? – Analyze querystring text directly using regex • Dangerous – Transform cts:query XML into intermediate form • Preserve Boolean logic & grouping • Preserve phrases • Preserve constraints
  • 30. Building Intermediate Query • The hack – Basically, undoing some of the parser's work – Text "run" concept • Similar to WordprocessingML
  • 31. Building Intermediate Query • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 32. Example: multi-word thesaurus • Content: – Same MarkLogic database of news/op-ed articles from detect-year() example • Query: – Same as before: "1996 United States Olypmics" – Start with the search:parse()output
  • 33. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 34. Example: multi-word thesaurus 1. Flatten query – remove implicit and-queries from search:parse() output:
  • 35. 1. Flatten query – XML should look more like cts:query string representation: Example: multi-word thesaurus cts:and-query( (cts:word-query("1996", "lang=en", 1), cts:word-query("United", "lang=en", 1), cts:word-query("States", "lang=en", 1), cts:word-query("Olympics", "lang=en", 1)), ())
  • 36. 1. Flatten query • Typeswitch on cts:and-query: 1. Check and-queries for parent and-query 2. Remove the nested ones, copy through anything else Example: multi-word thesaurus
  • 37. Example: multi-word thesaurus 1. Flatten query – Typeswitch function output:
  • 38. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 39. Example: multi-word thesaurus 2. Join sibling words in <run>: • Typeswitch on cts:word-query: 1. Ignore phrases 2. Delete if query is not the first. 3. Take first word-query in sequence and join with its following siblings into a <run>
  • 40. 2. Join sibling words in <run>: • Input: – search:parse("1996 United States Olympics")/local:unnest- ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 41. 2. Join sibling words in <run>: • Input: – search:parse("1996 (sprint OR marathon) United States Olympics")/local:unnest-ands(.)/local:create-runs(.) • Output: Example: multi-word thesaurus
  • 42. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 43. Example: multi-word thesaurus 3. Transform <run>s: 1. Store terms in thesaurus 2. Build cts:or-query of thesaurus terms 3. Using cts:or-query of terms, cts:highlight() <run>s, and replace with thesaurus synonyms
  • 44. 3. Transform <run>s: 1. store terms in thesaurus Example: multi-word thesaurus
  • 45. 3. Transform <run>s: 2. build cts:or-query of thesaurus terms: Example: multi-word thesaurus
  • 46. 3. Transform <run>s: 3. replace matches with synonyms: – cts:highlight() - powerful cts:query-based find/replace » » Example: multi-word thesaurus
  • 47. 3. Transform <run>s: 3. replace matches with synonyms: Example: multi-word thesaurus
  • 48. 3. Transform <run>s: Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $q-runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) return local:thsr-expand($runs, $q-thsr)
  • 50. Example: multi-word thesaurus • Intermediate query strategy 1. Flatten query 2. Join sibling words in <run> 3. Transform <run>s 4. Convert <run>s back to word queries
  • 51. 4. Convert <run>s back to word queries – Typeswitch: Example: multi-word thesaurus
  • 52. 4. Convert <run>s back to word queries Input: Example: multi-word thesaurus let $q-thsr := cts:or-query( doc("thesaurus.xml") //thsr:entry/thsr:term/cts:word-query(string(.))) ) let $runs := search:parse("1996 United States Olympics") /local:unnest-ands(.)/local:create-runs(.) let $expanded := local:thsr-expand($runs, $q-thsr) return local:resolve-runs($expanded)
  • 53. 4. Convert <run>s back to word queries Output: Example: multi-word thesaurus
  • 55. Enrich Your Query! • Takeaway 1. No added GUI 2. Didn't ask the user for additional input 3. Able to build more robust query before executing search
  • 56. • Many potential applications: – Ad-hoc weighting: Search API Hacking local:q-add-weights( search:parse("bananas"), (<element ns="$ns" name="p" weight="1"/>, <element ns="$ns" name="b" weight="2"/>, <element ns="$ns" name="title" weight="3.5"/>) )
  • 57. • Many potential applications: – Automatic spell correction: Search API Hacking
  • 58. • Many potential applications: – Detect entities • Transform text into element-based query • Less false positives and exclusions • Leverage indexes: Search API Hacking "New York Times"
  • 59. Search API Hacking • Other ideas – Regex unparsed query string • apply constraints, operators, etc as configured in Search API based on key words/patterns – Custom term handler • single-term transformations – Combine with data enrichment on ingestion • MarkLogic Entity Framework • Linguistic processing
  • 60. Hazards • Chaos – Daisy chained transformations can have unintended consequences – Performance • Pre-search transformations need to be fast • make sure to leverage indexes as much as possible • Larger queries do take longer