SlideShare a Scribd company logo
1 of 41
Download to read offline
Just the Job – Employing Apache
  Solr for Recruitment Search
                     Charlie Hull, Flax
charlie@flax.co.uk   @FlaxSearch          19th October 2011
What I Will Cover
 Who are Flax?




                  2
What I Will Cover
 Who are Flax?
 The Project & The Solution




                     3
What I Will Cover
 Who are Flax?
 The Project & The Solution
 How we did it
  •   A flexible pipeline in two parts
  •   Transforming the UI
  •   Performance
  •   Issues
  •   Results & benefits




                          4
What I Will Cover
 Who are Flax?
 The Project & The Solution
 How we did it
  •   A flexible pipeline in two parts
  •   Transforming the UI
  •   Performance
  •   Issues
  •   Results & benefits
 Conclusions & Lessons Learned
  • Learning to love open source search

                          5
Who are Flax?
 Search engine specialists with decades of
  experience
 Based in Cambridge, U.K.
 Customers include Financial Times, Durrants
  Ltd., Accenture, University of Cambridge
 UK Authorised Partner of Lucid Imagination
We also run a Search Meetup:



Start your own - add to www.searchmeetups.com !
The Project
 The client: Reed Specialist Recruitment




                        7
The Project
 The client: Reed Specialist Recruitment
 The data
  • Hundreds of millions of items to search
  • Hundreds of fields in the database schema
    (which will change in the future)
  • CVs (resumés) in Word, PDF formats
  • Multiple languages




                         8
The Project
 The client: Reed Specialist Recruitment
 The data
  • Hundreds of millions of items to search
  • Hundreds of fields in the database schema
    (which will change in the future)
  • CVs (resumés) in Word, PDF formats
  • Multiple languages
 The problem
  • Search takes several minutes
  • 3000+ users familiar with the old system
  • No foundation for innovation

                          9
The Solution – Apache Solr

 Flexible and extendable
  • This is only the first wave of development
  • A need for complex business rules to drive the
    search – Boosts & FunctionQueries




                         10
The Solution – Apache Solr

 Flexible and extendable
  • This is only the first wave of development
  • A need for complex business rules to drive the
    search – Boosts & FunctionQueries
 Economically scalable
  • Much more data to come
  • Too hard to predict future cost of commercial,
    closed source alternatives




                          11
The Solution – Apache Solr

 Flexible and extendable
    • This is only the first wave of development
    • A need for complex business rules to drive the
      search – Boosts & FunctionQueries
 Economically scalable
    • Much more data to come
    • Too hard to predict future cost of commercial,
      closed source alternatives

    Great support available - from        and


                            12
A flexible pipeline - in two parts
A flexible pipeline - in two parts

1. Indexer
  •   Reads an XML settings file
  •   Extracts data from Oracle
  •   Processes if necessary
  •   Adds to a Solr index
A flexible pipeline - in two parts

1. Indexer
   •   Reads an XML settings file
   •   Extracts data from Oracle
   •   Processes if necessary
   •   Adds to a Solr index
2. Config tool
   • Creates a Solr schema from the Indexer settings
   • Verifies types and checks for conflicts
The Indexer



 CV


         Actions         Processes   Solr Index



Oracle
 DB
                   xml
The Indexer



 CV


                               Solr Index



Oracle   CopyAction
 DB
                      xml
The Indexer


         CVAction
 CV      CVTikaSource
          CVSolrSource

                              Solr Index



Oracle
 DB
                    xml
The Indexer



 CV


                 MostRecent    Solr Index
                 DateProcess

Oracle
 DB
           xml
The Indexer



 CV


         Actions         Processes   Solr Index



Oracle
 DB
                   xml
The Indexer & The Config Tool



 CV
                                        Solr
                                      schema
         Actions         Processes                       Solr Index
                                        .xml

Oracle
 DB
                   xml
                                     Verify & Generate
The pipeline in code...
Actions

<action ref="copyAction" column="EMAIL" field="email" />

Processes

<process-map>
  <process field="boost_date">
       <beans:bean class="...MostRecentDateProcess">
        ...
           <beans:value>updateddate</beans:value>
           <beans:value>createddate</beans:value>
        ...
  </process>
</process-map>




                          22
The pipeline in code...
Actions

<action ref="copyAction" column="EMAIL" field="email"
type="string" indexed="true" stored="true"/>

Processes

<process-map>
  <process field="boost_date" type="tdate"
indexed="true" stored="false">
       <beans:bean class="...MostRecentDateProcess">
        ...
           <beans:value>updateddate</beans:value>
           <beans:value>createddate</beans:value>
        ...
  </process>
</process-map>


                          23
...and a Solr schema
<?xml version="1.0" encoding="UTF-8" ?>
  <schema>
    <fields>
      <field name="email" type="string" indexed="true"
stored="true" />
      <field name="boost_date" type="tdate" indexed="true"
stored="false"/>
    </fields>
  </schema>




                              24
Transforming the UI
Transforming the UI
Transforming the UI
Transforming the UI
Transforming the UI
Transforming the UI
Performance

 Many factors can affect search performance...




                        31
Performance

 Many factors can affect search performance...
 ...so we built a test framework
  • Randomly generated queries based on terms in
    the index
  • Average query times & number of results
    recorded
  • Allows for direct comparison of boost functions,
    for example




                         32
Performance...much improved!

 Sub-second searches
 Only a single server required
 So fast that the thin client hardware had to
  upgraded as it became a bottleneck!
 Still work to be done on improving indexing
  speed




                         33
Issues
 Users don't always understand their new
  freedoms
  • Training can be required on free text search,
    faceting...
  • Any issues reduce user confidence in new
    systems




                          34
Issues
 Users don't always understand their new
  freedoms
  • Training can be required on free text search,
    faceting...
  • Any issues reduce user confidence in new
    systems
 Solr features can conflict with each other
  • Make sure you understand how features interact
    – i.e. recency over relevance, synonyms,
    stopwords
  • Get the basics working first

                          35
Results & benefits

 Project delivered on time and under budget
 Now live across 350 offices UK & worldwide
 24/7/365 support provided by Lucid Imagination




                        36
Results & benefits

 Project delivered on time and under budget
 Now live across 350 offices UK & worldwide
 24/7/365 support provided by Lucid Imagination
 A very happy client!




                        37
Conclusions &
           Lessons Learned
 What we learned
  • A flexible pipeline is essential
  • Get the basics working first - watch out for
    feature conflict




                          38
Conclusions &
           Lessons Learned
 What we learned
  • A flexible pipeline is essential
  • Get the basics working first - watch out for
    feature conflict
 What Reed learned
  • User training is important - even if the new
    system is “simpler”
  • To love Open Source Search...




                          39
Conclusions &
          Lessons Learned

"The transition to Solr was the latest step in
our strategy to develop a truly worldclass
search application. We believe it provides a
robust architecture that meets our future
aims, it will scale economically and is a
welcome addition to our existing suite of
Open Source systems."




                        40
The End

 Thanks for listening!
 For more information please contact me:

  Charlie Hull, Managing Director, Flax
  charlie@flax.co.uk
  http://www.flax.co.uk/blog
  @FlaxSearch




                            41

More Related Content

What's hot

Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Databasekendallclark
 
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...Lucas Jellema
 
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesOracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesMichael Hichwa
 
Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...
Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...
Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...VibrantGroup
 
REST Enabling Your Oracle Database
REST Enabling Your Oracle DatabaseREST Enabling Your Oracle Database
REST Enabling Your Oracle DatabaseJeff Smith
 
Zero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExZero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExBradley Brown
 
Best Practices for Upgrading your JD Edwards Software from Oracle
Best Practices for Upgrading your JD Edwards Software from OracleBest Practices for Upgrading your JD Edwards Software from Oracle
Best Practices for Upgrading your JD Edwards Software from OracleUBC Corporation
 
Take a peek at Dell's smart EPM global environment
Take a peek at Dell's smart EPM global environmentTake a peek at Dell's smart EPM global environment
Take a peek at Dell's smart EPM global environmentRodrigo Radtke de Souza
 
Replacing Oracle Database at an International Bank
Replacing Oracle Database at an International BankReplacing Oracle Database at an International Bank
Replacing Oracle Database at an International BankMariaDB plc
 
WordPress Filters and Actions
WordPress Filters and ActionsWordPress Filters and Actions
WordPress Filters and ActionsGlenn Ansley
 
Oracle Database Management REST API
Oracle Database Management REST APIOracle Database Management REST API
Oracle Database Management REST APIJeff Smith
 
Change Management for Oracle Database with SQLcl
Change Management for Oracle Database with SQLcl Change Management for Oracle Database with SQLcl
Change Management for Oracle Database with SQLcl Jeff Smith
 
JD Edwards Archiving and Upgrades - a Case Study from DBG
JD Edwards Archiving and Upgrades - a Case Study from DBGJD Edwards Archiving and Upgrades - a Case Study from DBG
JD Edwards Archiving and Upgrades - a Case Study from DBGNERUG
 

What's hot (15)

Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
 
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
AMIS Oracle OpenWorld & CodeOne Review - Pillar 2 - Custom Application Develo...
 
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud ServicesOracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
Oracle APEX, Oracle Autonomous Database, Always Free Oracle Cloud Services
 
Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...
Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...
Jboss Application Server training-course-navi-mumbai-jboss-course-provider-na...
 
Epita pres
Epita presEpita pres
Epita pres
 
REST Enabling Your Oracle Database
REST Enabling Your Oracle DatabaseREST Enabling Your Oracle Database
REST Enabling Your Oracle Database
 
Zero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExZero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApEx
 
Rosenblum Workflow Choices Introducing XML
Rosenblum Workflow Choices Introducing XMLRosenblum Workflow Choices Introducing XML
Rosenblum Workflow Choices Introducing XML
 
Best Practices for Upgrading your JD Edwards Software from Oracle
Best Practices for Upgrading your JD Edwards Software from OracleBest Practices for Upgrading your JD Edwards Software from Oracle
Best Practices for Upgrading your JD Edwards Software from Oracle
 
Take a peek at Dell's smart EPM global environment
Take a peek at Dell's smart EPM global environmentTake a peek at Dell's smart EPM global environment
Take a peek at Dell's smart EPM global environment
 
Replacing Oracle Database at an International Bank
Replacing Oracle Database at an International BankReplacing Oracle Database at an International Bank
Replacing Oracle Database at an International Bank
 
WordPress Filters and Actions
WordPress Filters and ActionsWordPress Filters and Actions
WordPress Filters and Actions
 
Oracle Database Management REST API
Oracle Database Management REST APIOracle Database Management REST API
Oracle Database Management REST API
 
Change Management for Oracle Database with SQLcl
Change Management for Oracle Database with SQLcl Change Management for Oracle Database with SQLcl
Change Management for Oracle Database with SQLcl
 
JD Edwards Archiving and Upgrades - a Case Study from DBG
JD Edwards Archiving and Upgrades - a Case Study from DBGJD Edwards Archiving and Upgrades - a Case Study from DBG
JD Edwards Archiving and Upgrades - a Case Study from DBG
 

Viewers also liked

Practical Search in the Cloud - By Marc Krellenstein
Practical Search in the Cloud - By Marc KrellensteinPractical Search in the Cloud - By Marc Krellenstein
Practical Search in the Cloud - By Marc Krellensteinlucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingGlen Cathey
 
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Glen Cathey
 

Viewers also liked (8)

Practical Search in the Cloud - By Marc Krellenstein
Practical Search in the Cloud - By Marc KrellensteinPractical Search in the Cloud - By Marc Krellenstein
Practical Search in the Cloud - By Marc Krellenstein
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and Recruiting
 
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
 

Similar to Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719BingWang77
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?C4Media
 
Introduction to SoapUI day 1
Introduction to SoapUI day 1Introduction to SoapUI day 1
Introduction to SoapUI day 1Qualitest
 
Soap UI - Getting started
Soap UI - Getting startedSoap UI - Getting started
Soap UI - Getting startedQualitest
 
AD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension LibraryAD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension Librarypaidi_ed
 
SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013
SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013
SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013jtreague
 
Aai 3228-dev ops-tools-websphere-sl
Aai 3228-dev ops-tools-websphere-slAai 3228-dev ops-tools-websphere-sl
Aai 3228-dev ops-tools-websphere-slsflynn073
 
Done in 60 seconds - Creating Web 2.0 applications made easy
Done in 60 seconds - Creating Web 2.0 applications made easyDone in 60 seconds - Creating Web 2.0 applications made easy
Done in 60 seconds - Creating Web 2.0 applications made easyRoel Hartman
 
Presentation online application upgrade of oracle's bug db with edition-ba...
Presentation    online application upgrade of oracle's bug db with edition-ba...Presentation    online application upgrade of oracle's bug db with edition-ba...
Presentation online application upgrade of oracle's bug db with edition-ba...xKinAnx
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Lucidworks
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous DeliveryXebiaLabs
 
Informatica power center online training
Informatica power center online trainingInformatica power center online training
Informatica power center online trainingSmartittrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inidaQualitytrainings
 

Similar to Just the Job: Employing Solr for Recruitment Search -Charlie Hull (20)

Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
Introduction to SoapUI day 1
Introduction to SoapUI day 1Introduction to SoapUI day 1
Introduction to SoapUI day 1
 
Soap UI - Getting started
Soap UI - Getting startedSoap UI - Getting started
Soap UI - Getting started
 
AD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension LibraryAD1545 - Extending the XPages Extension Library
AD1545 - Extending the XPages Extension Library
 
SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013
SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013
SOA Suite 11g Project Experience - FDUG Meeting - November 14 2013
 
Aai 3228-dev ops-tools-websphere-sl
Aai 3228-dev ops-tools-websphere-slAai 3228-dev ops-tools-websphere-sl
Aai 3228-dev ops-tools-websphere-sl
 
Done in 60 seconds - Creating Web 2.0 applications made easy
Done in 60 seconds - Creating Web 2.0 applications made easyDone in 60 seconds - Creating Web 2.0 applications made easy
Done in 60 seconds - Creating Web 2.0 applications made easy
 
Presentation online application upgrade of oracle's bug db with edition-ba...
Presentation    online application upgrade of oracle's bug db with edition-ba...Presentation    online application upgrade of oracle's bug db with edition-ba...
Presentation online application upgrade of oracle's bug db with edition-ba...
 
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery
 
Informatica power center online training
Informatica power center online trainingInformatica power center online training
Informatica power center online training
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 
Informatica online training from inida
Informatica online training from inidaInformatica online training from inida
Informatica online training from inida
 

More from lucenerevolution

Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucenelucenerevolution
 

More from lucenerevolution (20)

Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 
10 keys to Solr's Future
10 keys to Solr's Future10 keys to Solr's Future
10 keys to Solr's Future
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Just the Job: Employing Solr for Recruitment Search -Charlie Hull

  • 1. Just the Job – Employing Apache Solr for Recruitment Search Charlie Hull, Flax charlie@flax.co.uk @FlaxSearch 19th October 2011
  • 2. What I Will Cover  Who are Flax? 2
  • 3. What I Will Cover  Who are Flax?  The Project & The Solution 3
  • 4. What I Will Cover  Who are Flax?  The Project & The Solution  How we did it • A flexible pipeline in two parts • Transforming the UI • Performance • Issues • Results & benefits 4
  • 5. What I Will Cover  Who are Flax?  The Project & The Solution  How we did it • A flexible pipeline in two parts • Transforming the UI • Performance • Issues • Results & benefits  Conclusions & Lessons Learned • Learning to love open source search 5
  • 6. Who are Flax?  Search engine specialists with decades of experience  Based in Cambridge, U.K.  Customers include Financial Times, Durrants Ltd., Accenture, University of Cambridge  UK Authorised Partner of Lucid Imagination We also run a Search Meetup: Start your own - add to www.searchmeetups.com !
  • 7. The Project  The client: Reed Specialist Recruitment 7
  • 8. The Project  The client: Reed Specialist Recruitment  The data • Hundreds of millions of items to search • Hundreds of fields in the database schema (which will change in the future) • CVs (resumés) in Word, PDF formats • Multiple languages 8
  • 9. The Project  The client: Reed Specialist Recruitment  The data • Hundreds of millions of items to search • Hundreds of fields in the database schema (which will change in the future) • CVs (resumés) in Word, PDF formats • Multiple languages  The problem • Search takes several minutes • 3000+ users familiar with the old system • No foundation for innovation 9
  • 10. The Solution – Apache Solr  Flexible and extendable • This is only the first wave of development • A need for complex business rules to drive the search – Boosts & FunctionQueries 10
  • 11. The Solution – Apache Solr  Flexible and extendable • This is only the first wave of development • A need for complex business rules to drive the search – Boosts & FunctionQueries  Economically scalable • Much more data to come • Too hard to predict future cost of commercial, closed source alternatives 11
  • 12. The Solution – Apache Solr  Flexible and extendable • This is only the first wave of development • A need for complex business rules to drive the search – Boosts & FunctionQueries  Economically scalable • Much more data to come • Too hard to predict future cost of commercial, closed source alternatives  Great support available - from and 12
  • 13. A flexible pipeline - in two parts
  • 14. A flexible pipeline - in two parts 1. Indexer • Reads an XML settings file • Extracts data from Oracle • Processes if necessary • Adds to a Solr index
  • 15. A flexible pipeline - in two parts 1. Indexer • Reads an XML settings file • Extracts data from Oracle • Processes if necessary • Adds to a Solr index 2. Config tool • Creates a Solr schema from the Indexer settings • Verifies types and checks for conflicts
  • 16. The Indexer CV Actions Processes Solr Index Oracle DB xml
  • 17. The Indexer CV Solr Index Oracle CopyAction DB xml
  • 18. The Indexer CVAction CV CVTikaSource CVSolrSource Solr Index Oracle DB xml
  • 19. The Indexer CV MostRecent Solr Index DateProcess Oracle DB xml
  • 20. The Indexer CV Actions Processes Solr Index Oracle DB xml
  • 21. The Indexer & The Config Tool CV Solr schema Actions Processes Solr Index .xml Oracle DB xml Verify & Generate
  • 22. The pipeline in code... Actions <action ref="copyAction" column="EMAIL" field="email" /> Processes <process-map> <process field="boost_date"> <beans:bean class="...MostRecentDateProcess"> ... <beans:value>updateddate</beans:value> <beans:value>createddate</beans:value> ... </process> </process-map> 22
  • 23. The pipeline in code... Actions <action ref="copyAction" column="EMAIL" field="email" type="string" indexed="true" stored="true"/> Processes <process-map> <process field="boost_date" type="tdate" indexed="true" stored="false"> <beans:bean class="...MostRecentDateProcess"> ... <beans:value>updateddate</beans:value> <beans:value>createddate</beans:value> ... </process> </process-map> 23
  • 24. ...and a Solr schema <?xml version="1.0" encoding="UTF-8" ?> <schema> <fields> <field name="email" type="string" indexed="true" stored="true" /> <field name="boost_date" type="tdate" indexed="true" stored="false"/> </fields> </schema> 24
  • 31. Performance  Many factors can affect search performance... 31
  • 32. Performance  Many factors can affect search performance...  ...so we built a test framework • Randomly generated queries based on terms in the index • Average query times & number of results recorded • Allows for direct comparison of boost functions, for example 32
  • 33. Performance...much improved!  Sub-second searches  Only a single server required  So fast that the thin client hardware had to upgraded as it became a bottleneck!  Still work to be done on improving indexing speed 33
  • 34. Issues  Users don't always understand their new freedoms • Training can be required on free text search, faceting... • Any issues reduce user confidence in new systems 34
  • 35. Issues  Users don't always understand their new freedoms • Training can be required on free text search, faceting... • Any issues reduce user confidence in new systems  Solr features can conflict with each other • Make sure you understand how features interact – i.e. recency over relevance, synonyms, stopwords • Get the basics working first 35
  • 36. Results & benefits  Project delivered on time and under budget  Now live across 350 offices UK & worldwide  24/7/365 support provided by Lucid Imagination 36
  • 37. Results & benefits  Project delivered on time and under budget  Now live across 350 offices UK & worldwide  24/7/365 support provided by Lucid Imagination  A very happy client! 37
  • 38. Conclusions & Lessons Learned  What we learned • A flexible pipeline is essential • Get the basics working first - watch out for feature conflict 38
  • 39. Conclusions & Lessons Learned  What we learned • A flexible pipeline is essential • Get the basics working first - watch out for feature conflict  What Reed learned • User training is important - even if the new system is “simpler” • To love Open Source Search... 39
  • 40. Conclusions & Lessons Learned "The transition to Solr was the latest step in our strategy to develop a truly worldclass search application. We believe it provides a robust architecture that meets our future aims, it will scale economically and is a welcome addition to our existing suite of Open Source systems." 40
  • 41. The End  Thanks for listening!  For more information please contact me: Charlie Hull, Managing Director, Flax charlie@flax.co.uk http://www.flax.co.uk/blog @FlaxSearch 41