This document provides an overview of the Elasticsearch search engine. It discusses that Elasticsearch is designed for the cloud and NoSQL generation. It is based on Apache Lucene and hides complexity with RESTful and JSON interfaces. Key points are that Elasticsearch is easy to get started with, scales horizontally by adding nodes, and is powerful with Lucene and parallel processing. The document also covers storing data as documents in types and indexes, and interacting with Elasticsearch via its REST API.
3. Abstract
• The need for a search engine ?
• Elasticsearch : a complete, simple and performant solution
• What about indexing Twitter ?
Make some noise on @DevoxxFR
with the #elasticsearch hashtag !
3
5. Usual use case with « SQL old school »
Having a document persisted in database :
• date attribute : 19/04/2012
• coded attribute country : FR
• Association table code/label
• Code : FR
• Label : France
• comment attribute : "There is a type error in the comment for this
product. We should call David."
Engine Elasticsearch Rivers Facets Demo Architecture Community
5
6. Usual use case with « SQL old school »
Having a document persisted in database : doc country
• date attribute : 19/04/2012 date code
• coded attribute country : FR country label
• Association table code/label comment
• Code : FR
• Label : France
• comment attribute : "There is a type error in the comment for this
product. We should call David."
Engine Elasticsearch Rivers Facets Demo Architecture Community
5
7. Usual need with « SQL old school »
• Find a document from december 2011 about france containing
error and david
• SQL :
Engine Elasticsearch Rivers Facets Demo Architecture Community
6
8. Usual need with « SQL old school »
• Find a document from december 2011 about france containing
error and david
• SQL :
SELECT
doc.*, pays.*
FROM
doc, pays
WHERE
doc.pays_code = pays.code AND
doc.date_doc > to_date('2011-12', 'yyyy-mm') AND
doc.date_doc < to_date('2012-01', 'yyyy-mm') AND
lower(pays.libelle) = 'france' AND
lower(doc.commentaire) LIKE ‘%error%' AND
lower(doc.commentaire) LIKE ‘%david%';
Engine Elasticsearch Rivers Facets Demo Architecture Community
6
9. Performance impact of like ‘%’
Engine Elasticsearch Rivers Facets Demo Architecture Community
7
10. Performance impact of like ‘%’
See also : http://www.cestpasdur.com/2012/04/01/elasticsearch-vs-mysql-recherche
Engine Elasticsearch Rivers Facets Demo Architecture Community
7
11. What is a search engine ?
Engine Elasticsearch Rivers Facets Demo Architecture Community
8
12. What is a search engine ?
• A search engine is :
• an index engine for documents
• a search engine on indexes
Engine Elasticsearch Rivers Facets Demo Architecture Community
8
13. What is a search engine ?
• A search engine is :
• an index engine for documents
• a search engine on indexes
• A search engine is more powerful to do searches :
Engine Elasticsearch Rivers Facets Demo Architecture Community
8
14. What is a search engine ?
• A search engine is :
• an index engine for documents
• a search engine on indexes
• A search engine is more powerful to do searches :
it’s designed for it !
Engine Elasticsearch Rivers Facets Demo Architecture Community
8
18. Elasticsearch
• Search engine for the NoSQL generation
Engine Elasticsearch Rivers Facets Demo Architecture Community
10
19. Elasticsearch
• Search engine for the NoSQL generation
• Based on the standard Apache Lucene library
Engine Elasticsearch Rivers Facets Demo Architecture Community
10
20. Elasticsearch
• Search engine for the NoSQL generation
• Based on the standard Apache Lucene library
• Hide the Java / Lucene complexity with standard HTTP / RESTful /
JSON services
Engine Elasticsearch Rivers Facets Demo Architecture Community
10
21. Elasticsearch
• Search engine for the NoSQL generation
• Based on the standard Apache Lucene library
• Hide the Java / Lucene complexity with standard HTTP / RESTful /
JSON services
• You can use it from whatever language or platform
Engine Elasticsearch Rivers Facets Demo Architecture Community
10
22. Elasticsearch
• Search engine for the NoSQL generation
• Based on the standard Apache Lucene library
• Hide the Java / Lucene complexity with standard HTTP / RESTful /
JSON services
• You can use it from whatever language or platform
• Add the cloud layer that Lucene miss
Engine Elasticsearch Rivers Facets Demo Architecture Community
10
23. Elasticsearch
• Search engine for the NoSQL generation
• Based on the standard Apache Lucene library
• Hide the Java / Lucene complexity with standard HTTP / RESTful /
JSON services
• You can use it from whatever language or platform
• Add the cloud layer that Lucene miss
• It’s an engine, not a graphical user interface !
Engine Elasticsearch Rivers Facets Demo Architecture Community
10
25. Key points
• Easy ! In some minutes (Zero Conf), you will get a full search engine
ready to get your documents and perform your searches.
Engine Elasticsearch Rivers Facets Demo Architecture Community
11
26. Key points
• Easy ! In some minutes (Zero Conf), you will get a full search engine
ready to get your documents and perform your searches.
• Efficient ! Just start new Elasticsearch nodes to scale horizontally
with replication and load balancing.
Engine Elasticsearch Rivers Facets Demo Architecture Community
11
27. Key points
• Easy ! In some minutes (Zero Conf), you will get a full search engine
ready to get your documents and perform your searches.
• Efficient ! Just start new Elasticsearch nodes to scale horizontally
with replication and load balancing.
• Powerful ! Lucene based product, with parallel processing to get
acceptable response time (mainly less than 100ms).
Engine Elasticsearch Rivers Facets Demo Architecture Community
11
28. Key points
• Easy ! In some minutes (Zero Conf), you will get a full search engine
ready to get your documents and perform your searches.
• Efficient ! Just start new Elasticsearch nodes to scale horizontally
with replication and load balancing.
• Powerful ! Lucene based product, with parallel processing to get
acceptable response time (mainly less than 100ms).
• Complete ! Many features : analysis and facets, percolation, rivers,
plugins, …
Engine Elasticsearch Rivers Facets Demo Architecture Community
11
30. Storing your data
• Document : A full object containing all your data (NoSQL meaning).
To think "search", you have to forget RDBMS and think "Documents"
Engine Elasticsearch Rivers Facets Demo Architecture Community
12
31. Storing your data
• Document : A full object containing all your data (NoSQL meaning).
To think "search", you have to forget RDBMS and think "Documents"
{
"text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr",
"created_at": "2012-04-06T20:45:36.000Z",
"source": "Twitter for iPad",
"truncated": false,
A tweet
"retweet_count": 0,
"hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 },
{ "text": "devoxxfr", "start": 47, "end": 55 } ],
"user": { "id": 51172224, "name": "David Pilato",
"screen_name": "dadoonet", "location": "France",
"description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL
world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" }
}
Engine Elasticsearch Rivers Facets Demo Architecture Community
12
32. Storing your data
• Document : A full object containing all your data (NoSQL meaning).
To think "search", you have to forget RDBMS and think "Documents"
{
"text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr",
"created_at": "2012-04-06T20:45:36.000Z",
"source": "Twitter for iPad",
"truncated": false,
A tweet
"retweet_count": 0,
"hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 },
{ "text": "devoxxfr", "start": 47, "end": 55 } ],
"user": { "id": 51172224, "name": "David Pilato",
"screen_name": "dadoonet", "location": "France",
"description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL
world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" }
}
• Type : Includes all documents of the same type
Engine Elasticsearch Rivers Facets Demo Architecture Community
12
33. Storing your data
• Document : A full object containing all your data (NoSQL meaning).
To think "search", you have to forget RDBMS and think "Documents"
{
"text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr",
"created_at": "2012-04-06T20:45:36.000Z",
"source": "Twitter for iPad",
"truncated": false,
A tweet
"retweet_count": 0,
"hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 },
{ "text": "devoxxfr", "start": 47, "end": 55 } ],
"user": { "id": 51172224, "name": "David Pilato",
"screen_name": "dadoonet", "location": "France",
"description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL
world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" }
}
• Type : Includes all documents of the same type
• Index : Logical storage of related document types
Engine Elasticsearch Rivers Facets Demo Architecture Community
12
34. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
35. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
36. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
• curl -XGET http://localhost:9200/twitter/tweet/1
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
37. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
• curl -XGET http://localhost:9200/twitter/tweet/1
• curl -XDELETE http://localhost:9200/twitter/tweet/1
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
38. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
• curl -XGET http://localhost:9200/twitter/tweet/1
• curl -XDELETE http://localhost:9200/twitter/tweet/1
Search
• curl -XGET http://localhost:9200/twitter/tweet/_search
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
39. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
• curl -XGET http://localhost:9200/twitter/tweet/1
• curl -XDELETE http://localhost:9200/twitter/tweet/1
Search
• curl -XGET http://localhost:9200/twitter/tweet/_search
• curl -XGET http://localhost:9200/twitter/_search
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
40. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
• curl -XGET http://localhost:9200/twitter/tweet/1
• curl -XDELETE http://localhost:9200/twitter/tweet/1
Search
• curl -XGET http://localhost:9200/twitter/tweet/_search
• curl -XGET http://localhost:9200/twitter/_search
• curl -XGET http://localhost:9200/_search
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
41. Playing with Elasticsearch
REST API : http://host:port/[index]/[type]/[_action/id]
HTTP Methods : GET, POST, PUT, DELETE
Documents
• curl -XPUT http://localhost:9200/twitter/tweet/1
• curl -XGET http://localhost:9200/twitter/tweet/1
• curl -XDELETE http://localhost:9200/twitter/tweet/1
Search
• curl -XGET http://localhost:9200/twitter/tweet/_search
• curl -XGET http://localhost:9200/twitter/_search
• curl -XGET http://localhost:9200/_search
Elasticsearch Meta Data
• curl -XGET http://localhost:9200/twitter/_status
Engine Elasticsearch Rivers Facets Demo Architecture Community
13
42. Let’s index a document
$ curl -XPUT localhost:9200/twitter/tweet/1 -d '
{
"text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr",
"created_at": "2012-04-06T20:45:36.000Z",
"source": "Twitter for iPad",
"truncated": false,
"retweet_count": 0,
"hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 },
{ "text": "devoxxfr", "start": 47, "end": 55 } ],
"user": { "id": 51172224, "name": "David Pilato",
"screen_name": "dadoonet", "location": "France",
"description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying
NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" }
}'
Engine Elasticsearch Rivers Facets Demo Architecture Community
14
43. Let’s index a document
$ curl -XPUT localhost:9200/twitter/tweet/1 -d '
{
"text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr",
"created_at": "2012-04-06T20:45:36.000Z",
"source": "Twitter for iPad",
"truncated": false,
"retweet_count": 0,
"hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 },
{ "text": "devoxxfr", "start": 47, "end": 55 } ],
"user": { "id": 51172224, "name": "David Pilato",
"screen_name": "dadoonet", "location": "France",
"description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying
NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" }
}'
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
}
Engine Elasticsearch Rivers Facets Demo Architecture Community
14
44. Let’s search for documents
$ curl localhost:9200/twitter/tweet/_search?q=elasticsearch
Engine Elasticsearch Rivers Facets Demo Architecture Community
15
51. Search results
• Elasticsearch gives you the 10 first results (even on many millions) :
pagination
• You can move in the resultset
$ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&from=10&size=10"
Engine Elasticsearch Rivers Facets Demo Architecture Community
16
52. Search results
• Elasticsearch gives you the 10 first results (even on many millions) :
pagination
• You can move in the resultset
$ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&from=10&size=10"
• Scoring is computed with term frequency in a document relative to the
term frequency in the index
$ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&explain=true"
Engine Elasticsearch Rivers Facets Demo Architecture Community
16
53. Searches
QueryDSL for advanced searches
Type Description
Search for everything (useful combined with filters)
Search with term analysis, wildcards (Lucene syntax* +, -, FROM, TO, ^)
Search for individual term without analysis
Search for a text with analysis (OR is applied between tokens by default)
Wildcard search (*, ?)
Combine many criteria (MUST, MUST NOT, SHOULD)
Range search (>, >=, <, <=)
Useful for autocomplete requirements
Filtering queries
Useful to find documents that are “like” provided text
Useful to find documents that are “like” provided text with a minimal constraint on found terms
Engine Elasticsearch Rivers Facets Demo Architecture Community
17
54. Searches
QueryDSL for advanced searches
Type Description
Match All Search for everything (useful combined with filters)
QueryString Search with term analysis, wildcards (Lucene syntax* +, -, FROM, TO, ^)
Term Search for individual term without analysis
Text Search for a text with analysis (OR is applied between tokens by default)
Wildcard Wildcard search (*, ?)
Bool Combine many criteria (MUST, MUST NOT, SHOULD)
Range Range search (>, >=, <, <=)
Prefix Useful for autocomplete requirements
Filtered Filtering queries
Fuzzy like this Useful to find documents that are “like” provided text
More like this Useful to find documents that are “like” provided text with a minimal constraint on found terms
* http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html
Engine Elasticsearch Rivers Facets Demo Architecture Community
17
69. Rivers
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
70. Rivers
• CouchDB River
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
71. Rivers
• CouchDB River
• MongoDB River
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
72. Rivers
• CouchDB River
• MongoDB River
• Wikipedia River
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
73. Rivers
• CouchDB River
• MongoDB River
• Wikipedia River
• Twitter River
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
74. Rivers
• CouchDB River
• MongoDB River
• Wikipedia River
• Twitter River
• RabbitMQ River
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
75. Rivers
• CouchDB River
• MongoDB River
• Wikipedia River
• Twitter River
• RabbitMQ River
• RSS River
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
76. Rivers
• CouchDB River
• MongoDB River
• Wikipedia River
• Twitter River
• RabbitMQ River
• RSS River
• Dick Rivers
Engine Elasticsearch Rivers Facets Demo Architecture Community
25
77. Looking at your data from different points of views
RESULT ANALYSIS (IN NEAR REAL TIME)
26
103. Near Real Time Data Visualization
• Perform a matchAll search on all data
• Update screen every x seconds
• While indexing new documents
Date histogram
Term
Engine Elasticsearch Rivers Facets Demo Architecture Community
37
112. Let’s go further : sharding / replica / scalabilty
ARCHITECTURE
40
113. Glossary
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
114. Glossary
• Node : An Elasticsearch instance (~ server ?)
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
115. Glossary
• Node : An Elasticsearch instance (~ server ?)
• Cluster : A set of nodes
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
116. Glossary
• Node : An Elasticsearch instance (~ server ?)
• Cluster : A set of nodes
• Shard : an index shard where you distribute documents
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
117. Glossary
• Node : An Elasticsearch instance (~ server ?)
• Cluster : A set of nodes
• Shard : an index shard where you distribute documents
• Replica : One or more shard copy in the cluster
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
118. Glossary
• Node : An Elasticsearch instance (~ server ?)
• Cluster : A set of nodes
• Shard : an index shard where you distribute documents
• Replica : One or more shard copy in the cluster
• Primary shard : shard elected as primary in the cluster. Lucene
index documents there.
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
119. Glossary
• Node : An Elasticsearch instance (~ server ?)
• Cluster : A set of nodes
• Shard : an index shard where you distribute documents
• Replica : One or more shard copy in the cluster
• Primary shard : shard elected as primary in the cluster. Lucene
index documents there.
• Secondary shard : store replicas of primary shards
Engine Elasticsearch Rivers Facets Demo Architecture Community
41
120. Let’s create an index
Cluster
Nœud 1
Client
CURL
Engine Elasticsearch Rivers Facets Demo Architecture Community
42
121. Let’s create an index
$ curl -XPUT localhost:9200/twitter -d '{ Cluster
"index" : {
"number_of_shards" : 2,
Nœud 1
"number_of_replicas" : 1 Shard 0
}
}' Shard 1
replication rule is not satisfied
Client
CURL
Engine Elasticsearch Rivers Facets Demo Architecture Community
42
Points abord&#xE9;s :\nA quels besoins essaye t on de r&#xE9;pondre ? A quoi servirait un moteur de recherche dans mon SI ?\nComment Elasticsearch r&#xE9;pond &#xE0; ces besoins et &#xE0; bien d'autres encore\nD&#xE9;mo Live : indexation de messages Twitter ! Faites du bruit en twittant sur @devoxxfr et #elasticsearch\n