SlideShare a Scribd company logo
1 of 36
Can you be dynamic and fast? “ Miss Marple and the case of the Missing MIPS” Zoë Slattery
Agenda ,[object Object],[object Object],[object Object],[object Object]
Index and search ,[object Object],[object Object],[object Object],[object Object],[object Object],1. Lagoze, C. Singhal, A. Information Discovery: Needles and Haystacks. IEEE Internet Computing. Volume 9(3), 16-18, 2005.
Options for information retrieval ,[object Object],[object Object],[object Object],[object Object],Egothor Xapian Lucene Implementation language Language bindings Language ports License Java None None BSD like C++ Perl, Python, PHP, Java, TCL None GPL Java None C++, Perl,  PHP, C# Apache 2
Lucene [2] 2. Gospodnetic, O., Hatcher, E. Lucene in Action. Manning Publications Co., Greenwich. 2005. DB Web File system Get user  query Present search  results Index Index documents Search index Gather data Lucene Application User
Lucene indexing start 3. Inverted index 1. Documents Analysis Index creation Optimise 4. Optimised inverted index . Oh for a muse of fire that would  acsend the brightest  heaven of  invention..... fire ascend ... Henry V, Scouting for boys... Aerospace, Henry V... Terms Documents end [fire]  [ascend]  [bright] [heaven] 2. Token stream
Agenda ,[object Object],[object Object],[object Object],[object Object]
Indexing speed ,[object Object],[object Object],[object Object],Java + JIT Java PHP 4 32 167 Time to index /seconds 0.3 3 43 Time to optimise /seconds 4.3 35 210 Total time Ouch! nearly 50 times as fast in Java
Why is the performance so bad? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Analysis - Java Analyzing "A Quick Brown Fox jumped over the Lazy Dog" StandardAnalyzer: [quick] [brown] [fox] [jumped] [over] [lazy] [dog]  SimpleAnalyzer: [a] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog]  StopAnalyzer: [quick] [brown] [fox] [jumped] [over] [lazy] [dog]  Analyzing "XY&Z Corporation - xyz@example.com" StandardAnalyzer: [xy&z] [corporation] [xyz@example.com]  SimpleAnalyzer: [xy] [z] [corporation] [xyz] [example] [com]  StopAnalyzer: [xy] [z] [corporation] [xyz] [example] [com]
Analysis - PHP Analysing "A Quick Brown Fox jumped over the Lazy Dog" Default (lower case) filter: [a]  [quick]  [brown]  [fox]  [jumped]  [over]  [the]  [lazy]  [dog]  Stop words filter: [quick]  [brown]  [fox]  [jumped]  [over]  [lazy]  [dog]  Short words filter: [quick]  [brown]  [fox]  [jumped]  [over]  [the]  [lazy]  [dog]  Analysing "XY&Z Corporation - xyz@example.com" Default (lower case) filter: [xy]  [z]  [corporation]  [xyz]  [example]  [com]  Stop words filter: [xy]  [z]  [corporation]  [xyz]  [example]  [com]  Short words filter: [xy]  [corporation]  [xyz]  [example]  [com]
Compare indexes Same 663 terms java php
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Execution profiles ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Java profile
Small problems with TPTP... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Java Java + profile 2.3 687258 Time to index /seconds 0.3 673851 Time to optimise /seconds 88 50 % time in indexing
PHP profile
No problems with this tool ,[object Object],[object Object],[object Object],[object Object],[object Object],PHP PHP + profile 5 70 Time to index /seconds 3 55 Time to optimise /seconds 63 56 % time in indexing
look at the normalize() code public function normalize(Token $srcToken ) {   $newToken = new Token( strtolower( $srcToken->getTermText() ), $srcToken->getStartOffset(), $srcToken->getEndOffset()); $newToken->setPositionIncrement($srcToken->getPositionIncrement()); return $newToken; }
The normalize() function Sum( ) = 2.92;  18.99 – 2.92 =  16.07
Micro benchmark <?php  require_once &quot;Token.php&quot;;  require_once &quot;LowerCase.php&quot;;  $token = new Token(&quot;GO&quot;, 105, 107);  $filter = new LowerCase();  for ($i=0; $i < 10000000; $i++) {  $norm_token = $filter->normalize($token);  }  ?>
normalize() opcodes compiled vars:  !0 = $srcToken, !1 = $newToken  line  #  op  ext  return  operands  ----------------------------------------------------------------------------  11  0  RECV 1  13  1  ZEND_FETCH_CLASS :0 'Token'  2  NEW $1 :0  3  ZEND_INIT_METHOD_CALL !0, 'getTermText'  4  DO_FCALL_BY_NAME 0  5  SEND_VAR_NO_REF $3  6  DO_FCALL 1  'strtolower'  7  SEND_VAR_NO_REF $4  14  8  ZEND_INIT_METHOD_CALL !0, 'getStartOffset'  9  DO_FCALL_BY_NAME 0  10  SEND_VAR_NO_REF $6  15  11  ZEND_INIT_METHOD_CALL  !0, 'getEndOffset'  12  DO_FCALL_BY_NAME 0  13  SEND_VAR_NO_REF $8  14  DO_FCALL_BY_NAME 3  15  ASSIGN  !1, $1  16  ......
System profile 1. Convert to lower case 2. Look up opcodes
How Xdebug works Script execution ,[object Object],[object Object],Execute function Call out to profiler – start time  Call out to profiler – end time  ZEND_INIT_METHOD_CALL DO_FCALL_BY_NAME
The normalize() function Sum( ) = 2.92;  18.99 – 2.92 =  16.07   Is consumed in setting up functions to be run
Why is function calling faster in Java? ,[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PHP profile
look at the call to normalize() $token = $this->normalize( new Zend_Search_Lucene_Analysis_Token($str, $pos, $endpos)); public function normalize(Token $srcToken ) {   $newToken = new Token(strtolower( $srcToken->getTermText() ), $srcToken->getStartOffset(), $srcToken->getEndOffset()); $newToken->setPositionIncrement($srcToken->getPositionIncrement()); return $newToken; }
look at the call to normalize() normalize() recoded.... $token = $this->normalize( new Zend_Search_Lucene_Analysis_Token($str, $pos, $endpos)); public function normalize (Token $srcToken) { $ srcToken->setTermText(strtolower($srcToken->getTermtext())); return $srcToken; }
After fix
Performance improvement? PHP + fix PHP 151 167 Time to index /seconds 43 43 Time to optimise /seconds Java  32 3 35 194 210 Total time 9.5 % improvement Java + JIT 4 0.3 4.3
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],3.  http://framework.zend.com/issues/browse/ZF-3683 4. Prechelt, L. An empirical comparison of seven programming languages. Computer. Volume 33(10), 23-29, 2000.
Options for PHP  Y Y Y N N N N Y 5.  http://pecl.php.net/package/clucene Do you  care about  speed? Use Zend  Search Lucene Only  need basic  features? Can  support Java  environment? Use a Web  Service? Use Lucene via a Java bridge No Lucene  solution  today [5] Use SOLR as  web service
Other useful links ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingabial
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muirlucenerevolution
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 

What's hot (20)

Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muir
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Lucene
LuceneLucene
Lucene
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Azure search
Azure searchAzure search
Azure search
 

Viewers also liked

Architecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneArchitecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneJosiane Gamgo
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scriptingTony Fabeen
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityStéphane Gamard
 
Lucandra
LucandraLucandra
Lucandraotisg
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted indexweedge
 
Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisJosiane Gamgo
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Using Solr Cloud to Tame an Index Explosion
Using Solr Cloud to Tame an Index ExplosionUsing Solr Cloud to Tame an Index Explosion
Using Solr Cloud to Tame an Index ExplosionLucidworks (Archived)
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 

Viewers also liked (18)

Solr
SolrSolr
Solr
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Architecture and implementation of Apache Lucene
Architecture and implementation of Apache LuceneArchitecture and implementation of Apache Lucene
Architecture and implementation of Apache Lucene
 
Devinsampa nginx-scripting
Devinsampa nginx-scriptingDevinsampa nginx-scripting
Devinsampa nginx-scripting
 
Index types
Index typesIndex types
Index types
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Lucene
LuceneLucene
Lucene
 
Lucandra
LucandraLucandra
Lucandra
 
Inverted index
Inverted indexInverted index
Inverted index
 
An introduction to inverted index
An introduction to inverted indexAn introduction to inverted index
An introduction to inverted index
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's Thesis
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Using Solr Cloud to Tame an Index Explosion
Using Solr Cloud to Tame an Index ExplosionUsing Solr Cloud to Tame an Index Explosion
Using Solr Cloud to Tame an Index Explosion
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 

Similar to Search Lucene

PHP Performance: Principles and tools
PHP Performance: Principles and toolsPHP Performance: Principles and tools
PHP Performance: Principles and tools10n Software, LLC
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierDatabricks
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdfhamzadamani7
 
これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)goccy
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdfYogeshwaran R
 
Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNicole Gomez
 
T3CON09 - FLOW3-based Intranet – first Experiences
T3CON09 - FLOW3-based Intranet – first ExperiencesT3CON09 - FLOW3-based Intranet – first Experiences
T3CON09 - FLOW3-based Intranet – first Experienceselementare teilchen GmbH
 
Celery in the Django
Celery in the DjangoCelery in the Django
Celery in the DjangoWalter Liu
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftTalentica Software
 
Python tools for testing web services over HTTP
Python tools for testing web services over HTTPPython tools for testing web services over HTTP
Python tools for testing web services over HTTPMykhailo Kolesnyk
 
Unit Test for ZF SlideShare Component
Unit Test for ZF SlideShare ComponentUnit Test for ZF SlideShare Component
Unit Test for ZF SlideShare Componentzftalk
 

Similar to Search Lucene (20)

PHP Performance: Principles and tools
PHP Performance: Principles and toolsPHP Performance: Principles and tools
PHP Performance: Principles and tools
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf
 
これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)これからのPerlプロダクトのかたち(YAPC::Asia 2013)
これからのPerlプロダクトのかたち(YAPC::Asia 2013)
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdf
 
Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
 
T3CON09 - FLOW3-based Intranet – first Experiences
T3CON09 - FLOW3-based Intranet – first ExperiencesT3CON09 - FLOW3-based Intranet – first Experiences
T3CON09 - FLOW3-based Intranet – first Experiences
 
Dutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: DistilledDutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: Distilled
 
Celery in the Django
Celery in the DjangoCelery in the Django
Celery in the Django
 
Demo
DemoDemo
Demo
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Python tools for testing web services over HTTP
Python tools for testing web services over HTTPPython tools for testing web services over HTTP
Python tools for testing web services over HTTP
 
first pitch
first pitchfirst pitch
first pitch
 
werwr
werwrwerwr
werwr
 
sdfsdf
sdfsdfsdfsdf
sdfsdf
 
college
collegecollege
college
 
first pitch
first pitchfirst pitch
first pitch
 
Greenathan
GreenathanGreenathan
Greenathan
 
Unit Test for ZF SlideShare Component
Unit Test for ZF SlideShare ComponentUnit Test for ZF SlideShare Component
Unit Test for ZF SlideShare Component
 

More from Jeremy Coates

Cyber Security and GDPR
Cyber Security and GDPRCyber Security and GDPR
Cyber Security and GDPRJeremy Coates
 
Aspect Oriented Programming
Aspect Oriented ProgrammingAspect Oriented Programming
Aspect Oriented ProgrammingJeremy Coates
 
Testing with Codeception
Testing with CodeceptionTesting with Codeception
Testing with CodeceptionJeremy Coates
 
An introduction to Phing the PHP build system (PHPDay, May 2012)
An introduction to Phing the PHP build system (PHPDay, May 2012)An introduction to Phing the PHP build system (PHPDay, May 2012)
An introduction to Phing the PHP build system (PHPDay, May 2012)Jeremy Coates
 
An introduction to Phing the PHP build system
An introduction to Phing the PHP build systemAn introduction to Phing the PHP build system
An introduction to Phing the PHP build systemJeremy Coates
 
Insects in your mind
Insects in your mindInsects in your mind
Insects in your mindJeremy Coates
 
Hudson Continuous Integration for PHP
Hudson Continuous Integration for PHPHudson Continuous Integration for PHP
Hudson Continuous Integration for PHPJeremy Coates
 
The Uncertainty Principle
The Uncertainty PrincipleThe Uncertainty Principle
The Uncertainty PrincipleJeremy Coates
 
Exploiting Php With Php
Exploiting Php With PhpExploiting Php With Php
Exploiting Php With PhpJeremy Coates
 
What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3Jeremy Coates
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain ExplainedJeremy Coates
 
Introduction to Version Control
Introduction to Version ControlIntroduction to Version Control
Introduction to Version ControlJeremy Coates
 
PHPNW Conference Update
PHPNW Conference UpdatePHPNW Conference Update
PHPNW Conference UpdateJeremy Coates
 

More from Jeremy Coates (17)

Cyber Security and GDPR
Cyber Security and GDPRCyber Security and GDPR
Cyber Security and GDPR
 
Aspect Oriented Programming
Aspect Oriented ProgrammingAspect Oriented Programming
Aspect Oriented Programming
 
Why is PHP Awesome
Why is PHP AwesomeWhy is PHP Awesome
Why is PHP Awesome
 
Testing with Codeception
Testing with CodeceptionTesting with Codeception
Testing with Codeception
 
An introduction to Phing the PHP build system (PHPDay, May 2012)
An introduction to Phing the PHP build system (PHPDay, May 2012)An introduction to Phing the PHP build system (PHPDay, May 2012)
An introduction to Phing the PHP build system (PHPDay, May 2012)
 
An introduction to Phing the PHP build system
An introduction to Phing the PHP build systemAn introduction to Phing the PHP build system
An introduction to Phing the PHP build system
 
Insects in your mind
Insects in your mindInsects in your mind
Insects in your mind
 
Phing
PhingPhing
Phing
 
Hudson Continuous Integration for PHP
Hudson Continuous Integration for PHPHudson Continuous Integration for PHP
Hudson Continuous Integration for PHP
 
The Uncertainty Principle
The Uncertainty PrincipleThe Uncertainty Principle
The Uncertainty Principle
 
Exploiting Php With Php
Exploiting Php With PhpExploiting Php With Php
Exploiting Php With Php
 
What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3
 
Kiss Phpnw08
Kiss Phpnw08Kiss Phpnw08
Kiss Phpnw08
 
Regex Basics
Regex BasicsRegex Basics
Regex Basics
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
Introduction to Version Control
Introduction to Version ControlIntroduction to Version Control
Introduction to Version Control
 
PHPNW Conference Update
PHPNW Conference UpdatePHPNW Conference Update
PHPNW Conference Update
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Search Lucene

  • 1. Can you be dynamic and fast? “ Miss Marple and the case of the Missing MIPS” Zoë Slattery
  • 2.
  • 3.
  • 4.
  • 5. Lucene [2] 2. Gospodnetic, O., Hatcher, E. Lucene in Action. Manning Publications Co., Greenwich. 2005. DB Web File system Get user query Present search results Index Index documents Search index Gather data Lucene Application User
  • 6. Lucene indexing start 3. Inverted index 1. Documents Analysis Index creation Optimise 4. Optimised inverted index . Oh for a muse of fire that would acsend the brightest heaven of invention..... fire ascend ... Henry V, Scouting for boys... Aerospace, Henry V... Terms Documents end [fire] [ascend] [bright] [heaven] 2. Token stream
  • 7.
  • 8.
  • 9.
  • 10. Analysis - Java Analyzing &quot;A Quick Brown Fox jumped over the Lazy Dog&quot; StandardAnalyzer: [quick] [brown] [fox] [jumped] [over] [lazy] [dog] SimpleAnalyzer: [a] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] StopAnalyzer: [quick] [brown] [fox] [jumped] [over] [lazy] [dog] Analyzing &quot;XY&Z Corporation - xyz@example.com&quot; StandardAnalyzer: [xy&z] [corporation] [xyz@example.com] SimpleAnalyzer: [xy] [z] [corporation] [xyz] [example] [com] StopAnalyzer: [xy] [z] [corporation] [xyz] [example] [com]
  • 11. Analysis - PHP Analysing &quot;A Quick Brown Fox jumped over the Lazy Dog&quot; Default (lower case) filter: [a] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] Stop words filter: [quick] [brown] [fox] [jumped] [over] [lazy] [dog] Short words filter: [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] Analysing &quot;XY&Z Corporation - xyz@example.com&quot; Default (lower case) filter: [xy] [z] [corporation] [xyz] [example] [com] Stop words filter: [xy] [z] [corporation] [xyz] [example] [com] Short words filter: [xy] [corporation] [xyz] [example] [com]
  • 12. Compare indexes Same 663 terms java php
  • 13.
  • 14.
  • 16.
  • 18.
  • 19. look at the normalize() code public function normalize(Token $srcToken ) { $newToken = new Token( strtolower( $srcToken->getTermText() ), $srcToken->getStartOffset(), $srcToken->getEndOffset()); $newToken->setPositionIncrement($srcToken->getPositionIncrement()); return $newToken; }
  • 20. The normalize() function Sum( ) = 2.92; 18.99 – 2.92 = 16.07
  • 21. Micro benchmark <?php require_once &quot;Token.php&quot;; require_once &quot;LowerCase.php&quot;; $token = new Token(&quot;GO&quot;, 105, 107); $filter = new LowerCase(); for ($i=0; $i < 10000000; $i++) { $norm_token = $filter->normalize($token); } ?>
  • 22. normalize() opcodes compiled vars: !0 = $srcToken, !1 = $newToken line # op ext return operands ---------------------------------------------------------------------------- 11 0 RECV 1 13 1 ZEND_FETCH_CLASS :0 'Token' 2 NEW $1 :0 3 ZEND_INIT_METHOD_CALL !0, 'getTermText' 4 DO_FCALL_BY_NAME 0 5 SEND_VAR_NO_REF $3 6 DO_FCALL 1 'strtolower' 7 SEND_VAR_NO_REF $4 14 8 ZEND_INIT_METHOD_CALL !0, 'getStartOffset' 9 DO_FCALL_BY_NAME 0 10 SEND_VAR_NO_REF $6 15 11 ZEND_INIT_METHOD_CALL !0, 'getEndOffset' 12 DO_FCALL_BY_NAME 0 13 SEND_VAR_NO_REF $8 14 DO_FCALL_BY_NAME 3 15 ASSIGN !1, $1 16 ......
  • 23. System profile 1. Convert to lower case 2. Look up opcodes
  • 24.
  • 25. The normalize() function Sum( ) = 2.92; 18.99 – 2.92 = 16.07 Is consumed in setting up functions to be run
  • 26.
  • 27.
  • 29. look at the call to normalize() $token = $this->normalize( new Zend_Search_Lucene_Analysis_Token($str, $pos, $endpos)); public function normalize(Token $srcToken ) { $newToken = new Token(strtolower( $srcToken->getTermText() ), $srcToken->getStartOffset(), $srcToken->getEndOffset()); $newToken->setPositionIncrement($srcToken->getPositionIncrement()); return $newToken; }
  • 30. look at the call to normalize() normalize() recoded.... $token = $this->normalize( new Zend_Search_Lucene_Analysis_Token($str, $pos, $endpos)); public function normalize (Token $srcToken) { $ srcToken->setTermText(strtolower($srcToken->getTermtext())); return $srcToken; }
  • 32. Performance improvement? PHP + fix PHP 151 167 Time to index /seconds 43 43 Time to optimise /seconds Java 32 3 35 194 210 Total time 9.5 % improvement Java + JIT 4 0.3 4.3
  • 33.
  • 34.
  • 35. Options for PHP Y Y Y N N N N Y 5. http://pecl.php.net/package/clucene Do you care about speed? Use Zend Search Lucene Only need basic features? Can support Java environment? Use a Web Service? Use Lucene via a Java bridge No Lucene solution today [5] Use SOLR as web service
  • 36.