SlideShare a Scribd company logo
1 of 71
Download to read offline
1
Building distributed search applications
using Apache Solr
The Fifth Elephant - 2014
Saumitra Srivastav
saumitra.srivastav@glassbeam.com
@_saumitra_
2
Agenda
1. What is Solr? Architecture Overview
2. Solr schema, config, tokenizers and filters
3. Indexing data:
a. From disk using SolrJ
b. Importing from database(MySQL) with DataImport Handler
4. Querying Solr
a. Filters, Faceting, highlighting, sorting, grouping, boosting, range, function and
fuzzy queries)
b. Adding 'Auto Suggest' component to auto complete user queries
c. Using 'Clustering' component to cluster similar results.
5. SolrCloud
a. Architecture
b. Setting up a multinode cluster with Zookeeper
c. Creating a distributed index
d. Collections API
6. Solr Admin UI
7. Solr performance factors
3
Demo App
Demo app which we will use for reference - http://saumitra.me/solrdemo/
4
Apache Lucene
• Apache Lucene is a high-performance, full-featured text search engine library
• Provides API to add search and indexing to your applications
• Provides scalable, High-Performance Indexing
• 150GB/hour on modern hardware
• small RAM requirements -- only 1MB heap
• Powerful, Accurate and Efficient Search Algorithms
• scoring
• phrase queries, wildcard queries, proximity queries, range queries
• sorting
• allows simultaneous update and searching
• flexible faceting, highlighting, joins and result grouping
• fast, memory-efficient and typo-tolerant suggesters
• With Lucene you need to write code for doing all this.
5
Apache Solr
• Search server build on top of Apache Lucene
• Provides API to access Lucene over HTTP
• Add more features on top of lucene
• Most of the programming tasks in Lucene are configurations in Solr
• Provides SolrCloud which adds
• Distributed search and indexing
• High Scalability
• Replication
• Load Balancing
• Fault Tolerance
• Solr is NOT a database
• Can be used a NoSQL store, as long as it is not abused
• Provides lot of other feature like Faceting, More Like This, Clustering, Data Import
Handler, Multiple language support, Rich document support
6
Lucene Indexing and Querying Overview
7
Inverted Index
8
Basic Concepts
• tf (t in d) : term frequency in a document
• measure of how often a term appears in the document
• the number of times term t appears in the currently scored document d
• idf (t) : inverse document frequency
• measure of whether the term is common or rare across all documents, i.e. how often the
term appears across the index
• obtained by dividing the total number of documents by the number of documents
containing the term, and then taking the logarithm of that quotient.
• coord : coordinate-level matching
• number of terms in the query that were found in the document,
• e.g. term ‘x’ and ‘y’ found in doc1 but only term ‘x’ is found in doc2 so for a query of ‘x’ OR
‘y’ doc1 will receive a higher score.
• boost (index) : boost of the field at index-time
• boost (query) : boost of the field at query-time
8
9
Apache Solr architecture
10
Hands-On Activity 1
Objective:
1. Solr directories walkthrough
2. Start single node solr instance
3. Index some sample documents
4. Admin UI overview
11
Solr Directory Structure - Base Dir
$ tree -L 1 solr-4.8.1/
solr-4.8.1/
├── CHANGES.txt
├── contrib
├── del
├── dist
├── docs
├── example
├── example-dih
├── licenses
├── LICENSE.txt
├── example-minimal
├── example-final
├── NOTICE.txt
├── README.txt
└── SYSTEM_REQUIREMENTS.txt
12
Solr Directory Structure - Example Dir
$ tree -L 1 solr-4.8.1/example/
solr-4.8.1/example/
├── contexts
├── etc
├── example-DIH
├── exampledocs
├── example-schemaless
├── lib
├── logs
├── multicore
├── README.txt
├── resources
├── scripts
├── solr
├── solr-webapp
├── start.jar
└── webapps
13
Solr Directory Structure - Cores Dir
$ tree -L 2 solr-4.8.1/example/solr/
solr-4.8.1/example/solr/
├── bin
├── collection1
│ ├── conf
│ ├── data
│ ├── core.properties
│ └── README.txt
├── README.txt
├── solr.xml
└── zoo.cfg
14
Solr Directory Structure - Conf Dir
$ tree -L 1 solr-4.8.1/example/solr/collection1/conf/
solr-4.8.1/example/solr/collection1/conf/
├── admin-extra.html
├── admin-extra.menu-bottom.html
├── admin-extra.menu-top.html
├── clustering
├── currency.xml
├── elevate.xml
├── lang
├── mapping-FoldToASCII.txt
├── mapping-ISOLatin1Accent.txt
├── protwords.txt
├── schema.xml
├── scripts.conf
├── solrconfig.xml
├── spellings.txt
├── stopwords.txt
├── synonyms.txt
├── update-script.js
├── velocity
└── xslt
15
Starting a solr node
• Go to example-minimal directory and start solr instance.
• cd /home/solruser/work/solr-4.8.1/example-minimal
• java -jar start.jar
• This will launch jetty with the Solr war and the example configs.
• By default solr starts on port 8983. To give a custom port:
• java -Djetty.port=9000 -jar start.jar
• Open your browser and point to http://localhost:8983/solr to see Solr Admin UI
• You will see a default collection named collection1.
16
Solr Schema
• Before indexing document, you need to define a schema. A schema serves multiple
purpose.
• Field related information
• Fields in you document
• Datatype of those fields
• Whether you want to index the field or store it or both
• Other configurations for each field like termVectors, termPositions, docValues, etc
• Dynamic fields
• Copy Fields
• Datatypes
• A datatype is a collection of tokenizers and filters which can be chained
• It tells Solr what operations to perform on the content of a field
• You can define different analyzers for indexing and querying
• Solr also provides a schemaless mode where it can auto-detect the dataypes of fields.
17
Analyzers
• Analyzers are components that pre-process input text at index time and/or at query
time.
• You can define separate analyzer for indexing and querying
• Make sure that you define indexing and querying analyzers in a compatible
manner.
• Analyzer consists of:
• Char Filter
• Tokenizers
• Token Filters
18
Analyzers
Char Filter
Tokenizers
Token Filters
Char Filter (solr.HTMLStripCharFilterFactory)
Text Data
This is a sample HTML document.
Tokenizer (solr.WhitespaceTokenizerFactory)
[This] [is] [a] [sample] [HTML] [document.]
Token Filters
(solr.StopFilterFactory &
solr. LowerCaseFilterFactory)
Tokens Tokens: [sample] [html] [document]
<html> <body>
<h1> This is a sample HTML document .</h1>
</body></html>
Analyzer Analyzer
19
Analyzers - Example
20
Separate index and query analyzer
21
Char Filters
• Char Filter is a component that pre-processes input characters (consuming and
producing a character stream) that can add, change, or remove characters while
preserving character position information.
• CharFilters can be chained.
• Example:
<charFilter
class="solr.PatternReplaceCharFilterFactory"
pattern="([^a-z])"
replacement="“
/>
22
Tokenizers
• A Tokenizer splits a stream of characters (from each individual field value) into a series
of tokens.
• There can be only one Tokenizer in each Analyzer.
• Solr provides following tokenization factories
• solr.KeywordTokenizerFactory
• solr.LetterTokenizerFactory
• solr.WhitespaceTokenizerFactory
• solr.LowerCaseTokenizerFactory
• solr.StandardTokenizerFactory
• solr.ClassicTokenizerFactory
• solr.UAX29URLEmailTokenizerFactory
• solr.PatternTokenizerFactory
• solr.PathHierarchyTokenizerFactory
• solr.ICUTokenizerFactory
23
Token Filters
• Tokens produced by the Tokenizer are passed through a series of Token Filters
• TokenFilters can add, change, or remove tokens.
• The field is then indexed by the resulting token stream.
• Detailed information about analyzers can be obtained from
https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizer
s,+and+Filters
24
Dynamic Fields
• Dynamic fields allow Solr to index fields that you did not explicitly define in your
schema.
• A dynamic field is just like a regular field except it has a name with a wildcard in it.
• When you are indexing documents, a field that does not match any explicitly defined
fields can be matched with a dynamic field.
25
Copy field
• CopyField directive can be used to copy the data of one(or more field) into another
field.
26
Fields Parameters
1. Indexed
2. Stored
3. Multivalued
4. DocValues
5. OmitNorms
6. TermVectors
7. TermPositions
8. TermOffsets
27
Hands-On Activity 2
Objective:
1. Create a new collection
2. Understand schema.xml contents
3. Create a custom datatype
4. Create schema for stackexchange data
5. Learn how to use Admin UI to analyze and tune fieldTypes
28
Solr Schema-less mode
29
Indexing Data
• You can modify a Solr index by POSTing commands to Solr to add (or update)
documents, delete documents, and commit pending adds and deletes.
• Add:
• ID field is the uniqueKey (aka primary key). In some cases you don’t need it. But you
should always define one. ID can be autogenerated.
http://wiki.apache.org/solr/UniqueKey
curl
http://localhost:8983/solr/update?commit=true
-H "Content-Type: text/xml“
--data-binary '<add><doc>
<field name="id">id1</field>
<field name=“st_content">My First Doc</field>
</doc></add>'
30
Indexing Data (cont...)
• Solr natively supports indexing structured documents in XML, CSV and JSON.
• Provides multiple request handlers called index handlers to add, delete and update
documents to the index.
• There is a unified update request handler that supports XML, CSV, JSON, and javabin
update requests:
• You can define new requestHandlers and register them in solrconfig.xml.
• https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handle
rs
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
31
Atomic Updates
• Sending an update request with an existing ID will overwrite that document.
• Solr supports simple atomic updates where you can modify only parts of a single
document.
• Solr supports several modifiers that atomically update values of a document.
1. set – set or replace a particular value, or remove the value if null is specified as
the new value
2. add – adds an additional value to a list
3. inc – increments a numeric value by a specific amount
curl http://localhost:8983/solr/update
-H 'Content-type:application/json'
-d '[{
"id" : “message1",
“source" : {"set":“error_log"},
“count" : {"inc":4},
“tags" : {"add":“apache"}
}]'
32
Solr Clients
• There are lot of clients for indexing and querying Solr.
http://wiki.apache.org/solr/IntegratingSolr
• Clinet Languages
• Ruby
• PHP
• Java
• Scala
• Python
• .NET
• Perl
• JavaScript
33
Indexing with SolrJ
• SolrJ is a java client to access solr. It offers a java interface to add, update, and query
the solr index.
SolrServer server = new HttpSolrServer("http://HOST:8983/solr/");
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "id", “doc1");
doc1.addField( “content", “This is first document” );
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField( "id", “doc2")
.addField( “content", “This is second document” );
Collection<SolrInputDocument> docs = new
ArrayList<SolrInputDocument>();
docs.add( doc1 );
docs.add( doc2 );
server.add(docs);
server.commit();
34
Indexing with SolrJ (cont…)
• SolrJ includes a client for SolrCloud, which is ZooKeeper aware
• To interact with SolrCloud, you should use an instance of CloudSolrServer, and pass it
your zooKeeper host(s).
• More on SolrCloud later.
CloudSolrServer server = new CloudSolrServer("localhost:2181");
server.setDefaultCollection(“mycollection");
SolrInputDocument doc = new SolrInputDocument();
....
....
server.commit();
35
Transaction Log and Commit
• Transaction log(tlog):
• File where the raw documents are written for recovery purposes
• On update, the entire document gets written to the tlog
• Commits:
• Hard commit
• Soft Commit
• Soft commits are about visibility, hard commits are about durability.
• More on this when we discuss SolrCloud
36
Hands-On Activity 3
Objective:
1. Creating a java project and add SolrJ dependency
2. Indexing single doc using SolrJ
3. Indexing in batch mode
4. Understand commit
37
Data Import Handler
• DataImportHandler provides a configuration driven way to import data from external
source into Solr
• External sources can be:
• Databases
• ftp, scp, etc
• XML, JSON, etc
• Provides options for full or delta imports
38
Data Import Handler (cont...)
• A SolrRequestHandler must be defined in solr-config.xml
• The data source can be added inline, or it can be put directly into the data-config.xml
• data-config.xml tells Solr:
1. How to fetch data (queries,url etc)
2. What to read ( resultset columns, xml fields etc)
3. How to process (modify/add/remove fields)
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data
+with+the+Data+Import+Handler
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
39
Data Import Handler - Script Transformers
• You can specify different types of transformation on data read from external source
before indexing in Solr
• Can be used to index dynamic fields using data import handlers
<dataConfig>
<script><![CDATA[
function WareAttributes(row){
row.put('attr_' + row.get('id'), row.get('raw_value') );
row.remove('id');
row.remove('raw_value');
return row;
}
]]></script>
...
<entity
name="attrs"
query="SELECT attribute_id as id, raw_value
FROM
ware_wareattribute WHERE ware_id = ${ware.id}"
transformer="script:WareAttributes"/>
</entity>
</document>
</dataConfig>
40
Objective:
1. Create MySQL tables for storing stackechange data
• Posts
• Users
• Comments
2. Load stackexchange dumps in MySQL
3. Define a data import handler
• Adding dependency and request handler in solrconfig.xml
• Define a data-config.xml file for solr to mysql fields mapping
4. Index document in Solr using data import handler
Hands-On Activity 4
41
Querying
• Solr supports multiple query syntaxes through query parser plugins
• A Query Parser is a component responsible for parsing the textual query and convert it
into corresponding Lucene Query objects.
• Solr provides a lot of in-built parsers
• lucene - The default "lucene" parser
• dismax - allows querying across multiple fields with different weights
• edismax - builds on dismax but with more features
• Func
• Boost
and many more (https://wiki.apache.org/solr/QueryParser)
• There are multiple ways to select which query parser to use for a certain request
1. defType - The default type parameter selects which query parser to use by default
for the main query.
Example: &q=foo bar&defType=lucene
2. LocalParams - Inside the main q or fq parameter you can select query parser using
the localParam syntax.
Example: &q={!dismax}foo bar
42
Defining a search handler
43
Querying (cont...)
• Simple text search
• http://localhost:8983/solr/collection1/stacksearch?q=da
ta
• Change number of rows retrieved
• http://localhost:8983/solr/collection1/stacksearch?q=da
ta&rows=20
• Pagination
• http://localhost:8983/solr/collection1/stacksearch?q=da
ta&rows=20&start=50
44
Querying (cont...)
• Searching on a field
• http://localhost:8983/solr/collection1/stacksearch?q=st
_post:data
• http://localhost:8983/solr/collection1/stacksearch?q=st
_posttype:data
• Specifying list of fields to be retrieved
• http://localhost:8983/solr/collection1/stacksearch?q=st
_post:data&fl=id,st_post,st_tags
• Delete all documents
• http://localhost:8983/solr/collection1/update?stream.body=<delete><query>*:*
</query></delete>&commit=true
45
Querying (cont...)
• Searching multiple fields
• http://localhost:8983/solr/collection1/stacksearch?q=st
_post:data AND st_posttype:QUESTION
• NOT query
• http://localhost:8983/solr/collection1/stacksearch?q=NO
T st_post:data
• Boolean query
• http://localhost:8983/solr/collection1/stacksearch?q=st_post:(data+sensor)
• http://localhost:8983/solr/collection1/stacksearch?q=st_post:(data OR sensor)
• Sort Query
• http://localhost:8983/solr/collection1/stacksearch?q=st_post:data&fl=id,st_post,s
t_score&sort=st_score desc
46
Querying - Faceting
47
Querying - Faceting
• Enable faceting on 2 fields
• http://localhost:8983/solr/collection1/stacksearch?q=st
_post:data&facet=true&facet.field=st_posttype&facet.fie
ld=st_tags
• Changing limit and mincount
• http://localhost:8983/solr/collection1/stacksearch?q=st
_post:data&facet=true&facet.field=st_posttype&facet.fie
ld=st_tags&facet.limit=1000&facet.mincount=1
• Changing facet method
• http://localhost:8983/solr/collection1/stacksearch?q=st
_post:data&facet=true&facet.field=st_posttype&facet.fie
ld=st_tags&facet.limit=1000&facet.mincount=1&facet.meth
od=enum
48
Stats query
• http://localhost:8983/solr/collection1/select?q=*:*&rows=0&
stats=true&stats.field=st_creationdate
49
Facet Range Query
http://localhost:8983/solr/collection1/select?q=*:*&rows=0&facet=true&facet.range=st_c
reationdate&facet.range.start=2011-03-22T01:33:06Z&facet.range.end=2014-03-
22T01:33:06Z&facet.range.gap=%2B1YEAR
50
Range, Boosting, Fuzzy, Proximity Query
• Range
• http://localhost:8983/solr/collection1/select?q=st_scor
e:[1 TO 3]&fl=id,st_score
• Boosting on a field
• http://localhost:8983/solr/select/?defType=dismax&q=dat
a&bq=st_posttype:QUESTION^5.0&qf=st_post
• Fuzzy Search
• http://localhost:8983/solr/collection1/select?defType=d
ismax&q=electromagnet~0.9&qf=st_post
• Proximity search
• http://localhost:8983/solr/collection1/stacksearch?q=“c
alculating coordinates”~2
51
Function Queries
• Function queries enable you to generate a relevancy score using the actual value of
one or more numeric fields.
• Examples:
1. http://localhost:8983/solr/collection1/select?q=*:*&fl=
sum(st_score,st_favoritecount),st_score,st_favoritecoun
t
2. http://localhost:8983/solr/collection1/select?q=*:*&fl=
max(st_score,st_favoritecount),st_score,st_favoritecoun
t
3. http://localhost:8983/solr/collection1/select?q=*:*&fl=
ms(NOW,st_creationdate),st_creationdate
4. http://localhost:8983/solr/collection1/select?q=st_titl
e:*&fl=norm(st_title),st_title
• https://cwiki.apache.org/confluence/display/solr/Function+Q
ueries
52
Group and Term Query
• Term
• http://localhost:8983/solr/collection1/terms?terms.fl=s
t_post&terms.prefix=data
• Group
• http://localhost:8983/solr/collection1/select?q=st_post
:*&group=true&group.field=st_site
53
More Like This
• The MoreLikeThis search component enables users to query for documents similar to
a document in their result list.
• It uses terms from the original document to find similar documents in the index.
• Ways to use MLT:
1. Request handler
2. Search component
3. MoreLikeThisHandler - request handler with externally supplied text
http://localhost:8983/solr/collection1/select?
q=id:robotics_1
&mlt.count=5
&mlt=true
&mlt.fl=st_post
54
Clustering
• Solr uses Carrot library for clustering search results and documents
• Clustering can be used to:
• summarize a whole bunch of results/documents
• group together semantically related results/documents
• To use clustering:
• Add ClusteringComponent in solrconfig.xml
• Reference the clustering component in request handler
• Supports 3 algorithm:
• Lingo
• STC
• BisectingKMeans
http://localhost:8983/solr/collection1/stacksearch?q=st_post:
data&clustering=true&clustering.results=true&carrot.title=st_
post&rows=20
55
AutoComplete / Suggester
• Autocomplete can be achieved in multiple ways in Solr:
1. Faceting using the prefix parameter
2. TermsComponent
3. Suggester
• Based on SpellCheckComponent
• Ngrams Based
56
Hands-On Activity 5
Objective:
1. Define a search handler named stacksearch and declare
1. defaults
2. appends
3. last-components
2. Try out different queries from the queries note and understand
the response format & results
3. Define a suggester component for ‘autocomplete’ using ‘post’
field as source
57
SolrCloud
• SolrCloud is NOT Solr deployed on cloud
• SolrCloud provides the ability to setup cluster of Solr servers that combines fault
tolerance and high availability and provides distributed indexing and search
capabilities.
• Subset of optional features in Solr to enable and simplify horizontal scaling a search
index using sharding and replication.
• SolrCloud provides
1. performance
2. scalability
3. high-availability
4. simplicity
5. elasticity
58
SolrCloud - High Level Setup
59
SolrCloud - High Level Architecture
60
SolrCloud - Terminology
• ZooKeeper: Distributed coordination service that provides centralized configuration,
cluster state management, and leader election
• Node: JVM process bound to a specific port on a machine; hosts the Solr web
application
• Collection: Search index distributed across multiple nodes; each collection has a name,
shard count, and replication factor
• Replication Factor: Number of copies of a document in a collection
• Shard: Logical slice of a collection; each shard has a name, hash range, leader, and
replication factor. Documents are assigned to one and only one shard per collection
using a hash-based document routing strategy
• Replica: Solr index that hosts a copy of a shard in a collection; behind the scenes, each
replica is implemented as a Solr core
• Leader: Replica in a shard that assumes special duties needed to support distributed
indexing in Solr; each shard has one and only one leader at any time and leaders are
elected using ZooKeeper
61
SolrCloud - Collections
• A collection is a distributed index defined by:
1. named configuration stored in ZooKeeper
2. number of shards: documents are distributed across N partitions of the
index
3. document routing strategy: how documents get assigned to shards
4. replication factor: how many copies of each document in the collection
62
SolrCloud - Sharding
• Collection has a fixed number of shards
• existing shards can be split
• When to shard?
• Large number of docs
• Large document sizes
• Parallelization during indexing and queries
• Data partitioning (custom hashing)
63
SolrCloud - Replication
• Why replicate?
• High-availability
• Load balancing
• How does it work in SolrCloud?
• Near-real-time, NOT master-slave
• Leader forwards to replicas in parallel,
• waits for response
• Error handling during indexing is tricky
64
SolrCloud - Document Routing
• Each shard covers a hash-range
• Default: Hash ID into 32-bit integer, map to range
• leads to balanced (roughly) shards
• Custom-hashing
• Tri-level: app!user!doc
• Implicit: no hash-range set for shards
65
SolrCloud - Distributed Indexing
66
SolrCloud - Distributed Querying
67
SolrCloud - Shard Splitting
• Can split shards into two sub-shards
• Live splitting. No downtime needed.
• Requests start being forwarded to sub-shards
automatically
• Expensive operation: Use as required during low traffic
68
Collections API
• https://cwiki.apache.org/confluence/display/solr/Collections+API
• API’s to create and perform operations on collections:
1. CREATE: create a collection
2. RELOAD: reload a collection
3. SPLITSHARD: split a shard into two new shards
4. CREATESHARD: create a new shard
5. DELETESHARD: delete an inactive shard
6. CREATEALIAS: create or modify an alias for a collection
7. DELETEALIAS: delete an alias for a collection
8. DELETE: delete a collection
9. DELETEREPLICA: delete a replica of a shard
10. ADDREPLICA: add a replica of a shard
11. CLUSTERPROP: Add/edit/delete a cluster-wide property
12. MIGRATE: Migrate documents to another collection
13. ADDROLE: Add a specific role to a node in the cluster
14. REMOVEROLE: Remove an assigned role
15. OVERSEERSTATUS: Get status and statistics of the overseer
16. CLUSTERSTATUS: Get cluster status
17. REQUESTSTATUS: Get the status of a previous asynchronous request
69
Hands-On Activity 6
Objective:
1. Setup a 2 instance zookeeper quorum
2. Launch a 4 node Solr cluster
3. Upload a configSet to zookeeper
4. Create a 2 shard 2 replica collection using Collections API
5. Index document with SolrJ using CloudSolrServer
70
Solr Performance Factors
• Schema Design
• # of Indexed Fields
• omitNorms
• Term-vectors
• Docvalues
• Configuration
• mergeFactor
• caches
• Indexing
• Bulk updates
• Commit strategy
• Optimize
• Querying
71
Thanks!
• Contact
• saumitra.srivastav7@gmail.com
• @_saumitra_
• Solr references
• https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference
+Guide
• https://www.youtube.com/user/LuceneSolrRevolution/videos
• Mailing List
• User - solr-user-subscribe@lucene.apache.org
• Dev - dev-subscribe@lucene.apache.org
• Attributions
• Shalin Mangar - @shalinmangar
• Erik Hatcher - @ErikHatcher
• Timothy Potter - @thelabdude
• Yonik Seeley - @lucene_solr

More Related Content

What's hot

Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoAngel Borroy López
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138Jose Portillo
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELKYuHsuan Chen
 

What's hot (20)

Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Logstash
LogstashLogstash
Logstash
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for Alfresco
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
AtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of CustodyAtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of Custody
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
LDAP
LDAPLDAP
LDAP
 
Google file system
Google file systemGoogle file system
Google file system
 

Viewers also liked

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSSaumitra Srivastav
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentrymozillazg
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solrtomhill
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanGregg Donovan
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 

Viewers also liked (20)

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Drools Ecosystem
Drools EcosystemDrools Ecosystem
Drools Ecosystem
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFS
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Solr on Cloud
Solr on CloudSolr on Cloud
Solr on Cloud
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 

Similar to Apache Solr Workshop

Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )'Moinuddin Ahmed
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdutionXuan-Chao Huang
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 

Similar to Apache Solr Workshop (20)

Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr a.b-ab
Solr a.b-abSolr a.b-ab
Solr a.b-ab
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Apache Solr Workshop

  • 1. 1 Building distributed search applications using Apache Solr The Fifth Elephant - 2014 Saumitra Srivastav saumitra.srivastav@glassbeam.com @_saumitra_
  • 2. 2 Agenda 1. What is Solr? Architecture Overview 2. Solr schema, config, tokenizers and filters 3. Indexing data: a. From disk using SolrJ b. Importing from database(MySQL) with DataImport Handler 4. Querying Solr a. Filters, Faceting, highlighting, sorting, grouping, boosting, range, function and fuzzy queries) b. Adding 'Auto Suggest' component to auto complete user queries c. Using 'Clustering' component to cluster similar results. 5. SolrCloud a. Architecture b. Setting up a multinode cluster with Zookeeper c. Creating a distributed index d. Collections API 6. Solr Admin UI 7. Solr performance factors
  • 3. 3 Demo App Demo app which we will use for reference - http://saumitra.me/solrdemo/
  • 4. 4 Apache Lucene • Apache Lucene is a high-performance, full-featured text search engine library • Provides API to add search and indexing to your applications • Provides scalable, High-Performance Indexing • 150GB/hour on modern hardware • small RAM requirements -- only 1MB heap • Powerful, Accurate and Efficient Search Algorithms • scoring • phrase queries, wildcard queries, proximity queries, range queries • sorting • allows simultaneous update and searching • flexible faceting, highlighting, joins and result grouping • fast, memory-efficient and typo-tolerant suggesters • With Lucene you need to write code for doing all this.
  • 5. 5 Apache Solr • Search server build on top of Apache Lucene • Provides API to access Lucene over HTTP • Add more features on top of lucene • Most of the programming tasks in Lucene are configurations in Solr • Provides SolrCloud which adds • Distributed search and indexing • High Scalability • Replication • Load Balancing • Fault Tolerance • Solr is NOT a database • Can be used a NoSQL store, as long as it is not abused • Provides lot of other feature like Faceting, More Like This, Clustering, Data Import Handler, Multiple language support, Rich document support
  • 6. 6 Lucene Indexing and Querying Overview
  • 8. 8 Basic Concepts • tf (t in d) : term frequency in a document • measure of how often a term appears in the document • the number of times term t appears in the currently scored document d • idf (t) : inverse document frequency • measure of whether the term is common or rare across all documents, i.e. how often the term appears across the index • obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient. • coord : coordinate-level matching • number of terms in the query that were found in the document, • e.g. term ‘x’ and ‘y’ found in doc1 but only term ‘x’ is found in doc2 so for a query of ‘x’ OR ‘y’ doc1 will receive a higher score. • boost (index) : boost of the field at index-time • boost (query) : boost of the field at query-time 8
  • 10. 10 Hands-On Activity 1 Objective: 1. Solr directories walkthrough 2. Start single node solr instance 3. Index some sample documents 4. Admin UI overview
  • 11. 11 Solr Directory Structure - Base Dir $ tree -L 1 solr-4.8.1/ solr-4.8.1/ ├── CHANGES.txt ├── contrib ├── del ├── dist ├── docs ├── example ├── example-dih ├── licenses ├── LICENSE.txt ├── example-minimal ├── example-final ├── NOTICE.txt ├── README.txt └── SYSTEM_REQUIREMENTS.txt
  • 12. 12 Solr Directory Structure - Example Dir $ tree -L 1 solr-4.8.1/example/ solr-4.8.1/example/ ├── contexts ├── etc ├── example-DIH ├── exampledocs ├── example-schemaless ├── lib ├── logs ├── multicore ├── README.txt ├── resources ├── scripts ├── solr ├── solr-webapp ├── start.jar └── webapps
  • 13. 13 Solr Directory Structure - Cores Dir $ tree -L 2 solr-4.8.1/example/solr/ solr-4.8.1/example/solr/ ├── bin ├── collection1 │ ├── conf │ ├── data │ ├── core.properties │ └── README.txt ├── README.txt ├── solr.xml └── zoo.cfg
  • 14. 14 Solr Directory Structure - Conf Dir $ tree -L 1 solr-4.8.1/example/solr/collection1/conf/ solr-4.8.1/example/solr/collection1/conf/ ├── admin-extra.html ├── admin-extra.menu-bottom.html ├── admin-extra.menu-top.html ├── clustering ├── currency.xml ├── elevate.xml ├── lang ├── mapping-FoldToASCII.txt ├── mapping-ISOLatin1Accent.txt ├── protwords.txt ├── schema.xml ├── scripts.conf ├── solrconfig.xml ├── spellings.txt ├── stopwords.txt ├── synonyms.txt ├── update-script.js ├── velocity └── xslt
  • 15. 15 Starting a solr node • Go to example-minimal directory and start solr instance. • cd /home/solruser/work/solr-4.8.1/example-minimal • java -jar start.jar • This will launch jetty with the Solr war and the example configs. • By default solr starts on port 8983. To give a custom port: • java -Djetty.port=9000 -jar start.jar • Open your browser and point to http://localhost:8983/solr to see Solr Admin UI • You will see a default collection named collection1.
  • 16. 16 Solr Schema • Before indexing document, you need to define a schema. A schema serves multiple purpose. • Field related information • Fields in you document • Datatype of those fields • Whether you want to index the field or store it or both • Other configurations for each field like termVectors, termPositions, docValues, etc • Dynamic fields • Copy Fields • Datatypes • A datatype is a collection of tokenizers and filters which can be chained • It tells Solr what operations to perform on the content of a field • You can define different analyzers for indexing and querying • Solr also provides a schemaless mode where it can auto-detect the dataypes of fields.
  • 17. 17 Analyzers • Analyzers are components that pre-process input text at index time and/or at query time. • You can define separate analyzer for indexing and querying • Make sure that you define indexing and querying analyzers in a compatible manner. • Analyzer consists of: • Char Filter • Tokenizers • Token Filters
  • 18. 18 Analyzers Char Filter Tokenizers Token Filters Char Filter (solr.HTMLStripCharFilterFactory) Text Data This is a sample HTML document. Tokenizer (solr.WhitespaceTokenizerFactory) [This] [is] [a] [sample] [HTML] [document.] Token Filters (solr.StopFilterFactory & solr. LowerCaseFilterFactory) Tokens Tokens: [sample] [html] [document] <html> <body> <h1> This is a sample HTML document .</h1> </body></html> Analyzer Analyzer
  • 20. 20 Separate index and query analyzer
  • 21. 21 Char Filters • Char Filter is a component that pre-processes input characters (consuming and producing a character stream) that can add, change, or remove characters while preserving character position information. • CharFilters can be chained. • Example: <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^a-z])" replacement="“ />
  • 22. 22 Tokenizers • A Tokenizer splits a stream of characters (from each individual field value) into a series of tokens. • There can be only one Tokenizer in each Analyzer. • Solr provides following tokenization factories • solr.KeywordTokenizerFactory • solr.LetterTokenizerFactory • solr.WhitespaceTokenizerFactory • solr.LowerCaseTokenizerFactory • solr.StandardTokenizerFactory • solr.ClassicTokenizerFactory • solr.UAX29URLEmailTokenizerFactory • solr.PatternTokenizerFactory • solr.PathHierarchyTokenizerFactory • solr.ICUTokenizerFactory
  • 23. 23 Token Filters • Tokens produced by the Tokenizer are passed through a series of Token Filters • TokenFilters can add, change, or remove tokens. • The field is then indexed by the resulting token stream. • Detailed information about analyzers can be obtained from https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizer s,+and+Filters
  • 24. 24 Dynamic Fields • Dynamic fields allow Solr to index fields that you did not explicitly define in your schema. • A dynamic field is just like a regular field except it has a name with a wildcard in it. • When you are indexing documents, a field that does not match any explicitly defined fields can be matched with a dynamic field.
  • 25. 25 Copy field • CopyField directive can be used to copy the data of one(or more field) into another field.
  • 26. 26 Fields Parameters 1. Indexed 2. Stored 3. Multivalued 4. DocValues 5. OmitNorms 6. TermVectors 7. TermPositions 8. TermOffsets
  • 27. 27 Hands-On Activity 2 Objective: 1. Create a new collection 2. Understand schema.xml contents 3. Create a custom datatype 4. Create schema for stackexchange data 5. Learn how to use Admin UI to analyze and tune fieldTypes
  • 29. 29 Indexing Data • You can modify a Solr index by POSTing commands to Solr to add (or update) documents, delete documents, and commit pending adds and deletes. • Add: • ID field is the uniqueKey (aka primary key). In some cases you don’t need it. But you should always define one. ID can be autogenerated. http://wiki.apache.org/solr/UniqueKey curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml“ --data-binary '<add><doc> <field name="id">id1</field> <field name=“st_content">My First Doc</field> </doc></add>'
  • 30. 30 Indexing Data (cont...) • Solr natively supports indexing structured documents in XML, CSV and JSON. • Provides multiple request handlers called index handlers to add, delete and update documents to the index. • There is a unified update request handler that supports XML, CSV, JSON, and javabin update requests: • You can define new requestHandlers and register them in solrconfig.xml. • https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handle rs <requestHandler name="/update" class="solr.UpdateRequestHandler" />
  • 31. 31 Atomic Updates • Sending an update request with an existing ID will overwrite that document. • Solr supports simple atomic updates where you can modify only parts of a single document. • Solr supports several modifiers that atomically update values of a document. 1. set – set or replace a particular value, or remove the value if null is specified as the new value 2. add – adds an additional value to a list 3. inc – increments a numeric value by a specific amount curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d '[{ "id" : “message1", “source" : {"set":“error_log"}, “count" : {"inc":4}, “tags" : {"add":“apache"} }]'
  • 32. 32 Solr Clients • There are lot of clients for indexing and querying Solr. http://wiki.apache.org/solr/IntegratingSolr • Clinet Languages • Ruby • PHP • Java • Scala • Python • .NET • Perl • JavaScript
  • 33. 33 Indexing with SolrJ • SolrJ is a java client to access solr. It offers a java interface to add, update, and query the solr index. SolrServer server = new HttpSolrServer("http://HOST:8983/solr/"); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "id", “doc1"); doc1.addField( “content", “This is first document” ); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField( "id", “doc2") .addField( “content", “This is second document” ); Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); docs.add( doc1 ); docs.add( doc2 ); server.add(docs); server.commit();
  • 34. 34 Indexing with SolrJ (cont…) • SolrJ includes a client for SolrCloud, which is ZooKeeper aware • To interact with SolrCloud, you should use an instance of CloudSolrServer, and pass it your zooKeeper host(s). • More on SolrCloud later. CloudSolrServer server = new CloudSolrServer("localhost:2181"); server.setDefaultCollection(“mycollection"); SolrInputDocument doc = new SolrInputDocument(); .... .... server.commit();
  • 35. 35 Transaction Log and Commit • Transaction log(tlog): • File where the raw documents are written for recovery purposes • On update, the entire document gets written to the tlog • Commits: • Hard commit • Soft Commit • Soft commits are about visibility, hard commits are about durability. • More on this when we discuss SolrCloud
  • 36. 36 Hands-On Activity 3 Objective: 1. Creating a java project and add SolrJ dependency 2. Indexing single doc using SolrJ 3. Indexing in batch mode 4. Understand commit
  • 37. 37 Data Import Handler • DataImportHandler provides a configuration driven way to import data from external source into Solr • External sources can be: • Databases • ftp, scp, etc • XML, JSON, etc • Provides options for full or delta imports
  • 38. 38 Data Import Handler (cont...) • A SolrRequestHandler must be defined in solr-config.xml • The data source can be added inline, or it can be put directly into the data-config.xml • data-config.xml tells Solr: 1. How to fetch data (queries,url etc) 2. What to read ( resultset columns, xml fields etc) 3. How to process (modify/add/remove fields) https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data +with+the+Data+Import+Handler <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler>
  • 39. 39 Data Import Handler - Script Transformers • You can specify different types of transformation on data read from external source before indexing in Solr • Can be used to index dynamic fields using data import handlers <dataConfig> <script><![CDATA[ function WareAttributes(row){ row.put('attr_' + row.get('id'), row.get('raw_value') ); row.remove('id'); row.remove('raw_value'); return row; } ]]></script> ... <entity name="attrs" query="SELECT attribute_id as id, raw_value FROM ware_wareattribute WHERE ware_id = ${ware.id}" transformer="script:WareAttributes"/> </entity> </document> </dataConfig>
  • 40. 40 Objective: 1. Create MySQL tables for storing stackechange data • Posts • Users • Comments 2. Load stackexchange dumps in MySQL 3. Define a data import handler • Adding dependency and request handler in solrconfig.xml • Define a data-config.xml file for solr to mysql fields mapping 4. Index document in Solr using data import handler Hands-On Activity 4
  • 41. 41 Querying • Solr supports multiple query syntaxes through query parser plugins • A Query Parser is a component responsible for parsing the textual query and convert it into corresponding Lucene Query objects. • Solr provides a lot of in-built parsers • lucene - The default "lucene" parser • dismax - allows querying across multiple fields with different weights • edismax - builds on dismax but with more features • Func • Boost and many more (https://wiki.apache.org/solr/QueryParser) • There are multiple ways to select which query parser to use for a certain request 1. defType - The default type parameter selects which query parser to use by default for the main query. Example: &q=foo bar&defType=lucene 2. LocalParams - Inside the main q or fq parameter you can select query parser using the localParam syntax. Example: &q={!dismax}foo bar
  • 43. 43 Querying (cont...) • Simple text search • http://localhost:8983/solr/collection1/stacksearch?q=da ta • Change number of rows retrieved • http://localhost:8983/solr/collection1/stacksearch?q=da ta&rows=20 • Pagination • http://localhost:8983/solr/collection1/stacksearch?q=da ta&rows=20&start=50
  • 44. 44 Querying (cont...) • Searching on a field • http://localhost:8983/solr/collection1/stacksearch?q=st _post:data • http://localhost:8983/solr/collection1/stacksearch?q=st _posttype:data • Specifying list of fields to be retrieved • http://localhost:8983/solr/collection1/stacksearch?q=st _post:data&fl=id,st_post,st_tags • Delete all documents • http://localhost:8983/solr/collection1/update?stream.body=<delete><query>*:* </query></delete>&commit=true
  • 45. 45 Querying (cont...) • Searching multiple fields • http://localhost:8983/solr/collection1/stacksearch?q=st _post:data AND st_posttype:QUESTION • NOT query • http://localhost:8983/solr/collection1/stacksearch?q=NO T st_post:data • Boolean query • http://localhost:8983/solr/collection1/stacksearch?q=st_post:(data+sensor) • http://localhost:8983/solr/collection1/stacksearch?q=st_post:(data OR sensor) • Sort Query • http://localhost:8983/solr/collection1/stacksearch?q=st_post:data&fl=id,st_post,s t_score&sort=st_score desc
  • 47. 47 Querying - Faceting • Enable faceting on 2 fields • http://localhost:8983/solr/collection1/stacksearch?q=st _post:data&facet=true&facet.field=st_posttype&facet.fie ld=st_tags • Changing limit and mincount • http://localhost:8983/solr/collection1/stacksearch?q=st _post:data&facet=true&facet.field=st_posttype&facet.fie ld=st_tags&facet.limit=1000&facet.mincount=1 • Changing facet method • http://localhost:8983/solr/collection1/stacksearch?q=st _post:data&facet=true&facet.field=st_posttype&facet.fie ld=st_tags&facet.limit=1000&facet.mincount=1&facet.meth od=enum
  • 50. 50 Range, Boosting, Fuzzy, Proximity Query • Range • http://localhost:8983/solr/collection1/select?q=st_scor e:[1 TO 3]&fl=id,st_score • Boosting on a field • http://localhost:8983/solr/select/?defType=dismax&q=dat a&bq=st_posttype:QUESTION^5.0&qf=st_post • Fuzzy Search • http://localhost:8983/solr/collection1/select?defType=d ismax&q=electromagnet~0.9&qf=st_post • Proximity search • http://localhost:8983/solr/collection1/stacksearch?q=“c alculating coordinates”~2
  • 51. 51 Function Queries • Function queries enable you to generate a relevancy score using the actual value of one or more numeric fields. • Examples: 1. http://localhost:8983/solr/collection1/select?q=*:*&fl= sum(st_score,st_favoritecount),st_score,st_favoritecoun t 2. http://localhost:8983/solr/collection1/select?q=*:*&fl= max(st_score,st_favoritecount),st_score,st_favoritecoun t 3. http://localhost:8983/solr/collection1/select?q=*:*&fl= ms(NOW,st_creationdate),st_creationdate 4. http://localhost:8983/solr/collection1/select?q=st_titl e:*&fl=norm(st_title),st_title • https://cwiki.apache.org/confluence/display/solr/Function+Q ueries
  • 52. 52 Group and Term Query • Term • http://localhost:8983/solr/collection1/terms?terms.fl=s t_post&terms.prefix=data • Group • http://localhost:8983/solr/collection1/select?q=st_post :*&group=true&group.field=st_site
  • 53. 53 More Like This • The MoreLikeThis search component enables users to query for documents similar to a document in their result list. • It uses terms from the original document to find similar documents in the index. • Ways to use MLT: 1. Request handler 2. Search component 3. MoreLikeThisHandler - request handler with externally supplied text http://localhost:8983/solr/collection1/select? q=id:robotics_1 &mlt.count=5 &mlt=true &mlt.fl=st_post
  • 54. 54 Clustering • Solr uses Carrot library for clustering search results and documents • Clustering can be used to: • summarize a whole bunch of results/documents • group together semantically related results/documents • To use clustering: • Add ClusteringComponent in solrconfig.xml • Reference the clustering component in request handler • Supports 3 algorithm: • Lingo • STC • BisectingKMeans http://localhost:8983/solr/collection1/stacksearch?q=st_post: data&clustering=true&clustering.results=true&carrot.title=st_ post&rows=20
  • 55. 55 AutoComplete / Suggester • Autocomplete can be achieved in multiple ways in Solr: 1. Faceting using the prefix parameter 2. TermsComponent 3. Suggester • Based on SpellCheckComponent • Ngrams Based
  • 56. 56 Hands-On Activity 5 Objective: 1. Define a search handler named stacksearch and declare 1. defaults 2. appends 3. last-components 2. Try out different queries from the queries note and understand the response format & results 3. Define a suggester component for ‘autocomplete’ using ‘post’ field as source
  • 57. 57 SolrCloud • SolrCloud is NOT Solr deployed on cloud • SolrCloud provides the ability to setup cluster of Solr servers that combines fault tolerance and high availability and provides distributed indexing and search capabilities. • Subset of optional features in Solr to enable and simplify horizontal scaling a search index using sharding and replication. • SolrCloud provides 1. performance 2. scalability 3. high-availability 4. simplicity 5. elasticity
  • 58. 58 SolrCloud - High Level Setup
  • 59. 59 SolrCloud - High Level Architecture
  • 60. 60 SolrCloud - Terminology • ZooKeeper: Distributed coordination service that provides centralized configuration, cluster state management, and leader election • Node: JVM process bound to a specific port on a machine; hosts the Solr web application • Collection: Search index distributed across multiple nodes; each collection has a name, shard count, and replication factor • Replication Factor: Number of copies of a document in a collection • Shard: Logical slice of a collection; each shard has a name, hash range, leader, and replication factor. Documents are assigned to one and only one shard per collection using a hash-based document routing strategy • Replica: Solr index that hosts a copy of a shard in a collection; behind the scenes, each replica is implemented as a Solr core • Leader: Replica in a shard that assumes special duties needed to support distributed indexing in Solr; each shard has one and only one leader at any time and leaders are elected using ZooKeeper
  • 61. 61 SolrCloud - Collections • A collection is a distributed index defined by: 1. named configuration stored in ZooKeeper 2. number of shards: documents are distributed across N partitions of the index 3. document routing strategy: how documents get assigned to shards 4. replication factor: how many copies of each document in the collection
  • 62. 62 SolrCloud - Sharding • Collection has a fixed number of shards • existing shards can be split • When to shard? • Large number of docs • Large document sizes • Parallelization during indexing and queries • Data partitioning (custom hashing)
  • 63. 63 SolrCloud - Replication • Why replicate? • High-availability • Load balancing • How does it work in SolrCloud? • Near-real-time, NOT master-slave • Leader forwards to replicas in parallel, • waits for response • Error handling during indexing is tricky
  • 64. 64 SolrCloud - Document Routing • Each shard covers a hash-range • Default: Hash ID into 32-bit integer, map to range • leads to balanced (roughly) shards • Custom-hashing • Tri-level: app!user!doc • Implicit: no hash-range set for shards
  • 67. 67 SolrCloud - Shard Splitting • Can split shards into two sub-shards • Live splitting. No downtime needed. • Requests start being forwarded to sub-shards automatically • Expensive operation: Use as required during low traffic
  • 68. 68 Collections API • https://cwiki.apache.org/confluence/display/solr/Collections+API • API’s to create and perform operations on collections: 1. CREATE: create a collection 2. RELOAD: reload a collection 3. SPLITSHARD: split a shard into two new shards 4. CREATESHARD: create a new shard 5. DELETESHARD: delete an inactive shard 6. CREATEALIAS: create or modify an alias for a collection 7. DELETEALIAS: delete an alias for a collection 8. DELETE: delete a collection 9. DELETEREPLICA: delete a replica of a shard 10. ADDREPLICA: add a replica of a shard 11. CLUSTERPROP: Add/edit/delete a cluster-wide property 12. MIGRATE: Migrate documents to another collection 13. ADDROLE: Add a specific role to a node in the cluster 14. REMOVEROLE: Remove an assigned role 15. OVERSEERSTATUS: Get status and statistics of the overseer 16. CLUSTERSTATUS: Get cluster status 17. REQUESTSTATUS: Get the status of a previous asynchronous request
  • 69. 69 Hands-On Activity 6 Objective: 1. Setup a 2 instance zookeeper quorum 2. Launch a 4 node Solr cluster 3. Upload a configSet to zookeeper 4. Create a 2 shard 2 replica collection using Collections API 5. Index document with SolrJ using CloudSolrServer
  • 70. 70 Solr Performance Factors • Schema Design • # of Indexed Fields • omitNorms • Term-vectors • Docvalues • Configuration • mergeFactor • caches • Indexing • Bulk updates • Commit strategy • Optimize • Querying
  • 71. 71 Thanks! • Contact • saumitra.srivastav7@gmail.com • @_saumitra_ • Solr references • https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference +Guide • https://www.youtube.com/user/LuceneSolrRevolution/videos • Mailing List • User - solr-user-subscribe@lucene.apache.org • Dev - dev-subscribe@lucene.apache.org • Attributions • Shalin Mangar - @shalinmangar • Erik Hatcher - @ErikHatcher • Timothy Potter - @thelabdude • Yonik Seeley - @lucene_solr