SlideShare a Scribd company logo
1 of 30
Download to read offline
Benchmarking Solr Performance at Scale
About Me 
• Lucene/Solr committer. Work for Lucidworks; focus on hardening 
SolrCloud, devops, big data architecture / deployments 
• Operated smallish cluster in AWS for Dachis Group (1.5 years ago, 18 
shards ~900M docs) 
• Solr Scale Toolkit: Fabric/boto framework for deploying and managing 
clusters in EC2 
• Co-author of Solr In Action with Trey Grainger
Agenda 
1. Quick review of the SolrCloud architecture 
2. Indexing & Query performance tests 
3. Solr Scale Toolkit (quick overview) 
4. Q & A
Solr in the wild … 
https://twitter.com/bretthoerner/status/476830302430437376
SolrCloud distilled 
Subset of optional features in Solr to enable and 
simplify horizontal scaling a search index using 
sharding and replication. 
Goals 
performance, scalability, high-availability, 
simplicity, elasticity, and 
community-driven!
Collection == distributed index 
A collection is a distributed index defined by: 
• named configuration stored in ZooKeeper 
• number of shards: documents are distributed across N partitions of the index 
• document routing strategy: how documents get assigned to shards 
• replication factor: how many copies of each document in the collection 
Collections API: 
curl "http://localhost:8983/solr/admin/collections? 
action=CREATE&name=logstash4solr&replicationFactor=2& 
numShards=2&collection.configName=logs"
SolrCloud High-level Architecture
ZooKeeper 
• Is a very good thing ... clusters are a zoo! 
• Centralized configuration management 
• Cluster state management 
• Leader election (shard leader and overseer) 
• Overseer distributed work queue 
• Live Nodes 
• Ephemeral znodes used to signal a server is gone 
• Needs at least 3 nodes for quorum in production
ZooKeeper: State Management 
• Keep track of live nodes /live_nodes znode 
• ephemeral nodes 
• ZooKeeper client timeout 
• Collection metadata and replica state in /clusterstate.json 
• Every Solr node has watchers for /live_nodes and /clusterstate.json 
• Leader election 
• ZooKeeper sequence number on ephemeral znodes
Scalability Highlights 
• No split-brain problems (b/c of ZooKeeper) 
• All nodes in cluster perform indexing and execute queries; no master node 
• Distributed indexing: No SPoF, high throughput via direct updates to 
leaders, automated failover to new leader 
• Distributed queries: Add replicas to scale-out qps; parallelize complex query 
computations; fault-tolerance 
• Indexing / queries continue so long as there is 1 healthy replica per shard
Cluster sizing 
How many servers do I need to index X docs? 
... shards ... ? 
... replicas ... ? 
I need N queries per second over M docs, how many 
servers do I need? 
It depends!
Testing Methodology 
• Transparent repeatable results 
• Ideally hoping for something owned by the community 
• Synthetic docs ~ 1K each on disk, mix of field types 
• Data set created using code borrowed from PigMix 
• English text fields generated using a Zipfian distribution 
• Java 1.7u67, Amazon Linux, r3.2xlarge nodes 
• enhanced networking enabled, placement group, same AZ 
• Stock Solr (cloud) 4.10 
• Using custom GC tuning parameters and auto-commit settings 
• Use Elastic MapReduce to generate indexing load 
• As many nodes as I need to drive Solr!
Indexing Performance 
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 
10 10 1 48 1762 73,780 
10 10 2 34 3727 34,881 
10 20 1 48 1282 101,404 
10 20 2 34 3207 40,536 
10 30 1 72 1070 121,495 
10 30 2 60 3159 41,152 
15 15 1 60 1106 117,541 
15 15 2 42 2465 52,738 
15 30 1 60 827 157,195 
15 30 2 42 2129 61,062
Visualize Server Performance
Direct Updates to Leaders
Replication
Indexing Performance Lessons 
• Solr has no built-in throttling support – will accept work until it falls over; need to build this into 
your indexing application logic 
• Oversharding helps parallelize indexing work and gives you an easy way to add more 
hardware to your cluster 
• GC tuning is critical (more below) 
• Auto-hard commit to keep transaction logs manageable 
• Auto soft-commit to see docs as they are indexed 
• Replication is expensive! (more work needed here)
GC Tuning 
• Stop-the-world GC pauses can lead to ZooKeeper session expiration (which is bad) 
• More JVMs with smaller heap sizes are better! (12-16GB max per JVM ~ less if you can) 
• MMapDirectory relies on sufficient memory available to the OS cache (off-heap) 
• GC activity during Solr indexing is stable and generally doesn’t cause any stop-the-world 
collections … queries are a different story 
• Enable verbose GC logging (even in prod) so you can troubleshoot issues: 
-verbose:gc –Xloggc:gc.log -XX:+PrintHeapAtGC -XX:+PrintGCDetails  
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps  
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime  
-XX:+PrintGCApplicationConcurrentTime
GC Flags I use with Solr 
-Xss256k  
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC  
-XX:MaxTenuringThreshold=8 -XX:NewRatio=3  
-XX:CMSInitiatingOccupancyFraction=40  
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4  
-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90  
-XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=12m  
-XX:CMSFullGCsBeforeCompaction=1  
-XX:+UseCMSInitiatingOccupancyOnly  
-XX:CMSTriggerPermRatio=80  
-XX:CMSMaxAbortablePrecleanTime=6000  
-XX:+CMSParallelRemarkEnabled  
-XX:+ParallelRefProcEnabled  
-XX:+UseLargePages -XX:+AggressiveOpts
Sizing GC Spaces 
http://kumarsoablog.blogspot.com/2013/02/jvm-parameter-survivorratio_7.html
Query Performance 
• Still a work in progress! 
• Sustained QPS & Execution time of 99th Percentile (coda hale metrics is good for this) 
• Stable: ~5,000 QPS / 99th at 300ms while indexing ~10,000 docs / sec 
• Using the TermsComponent to build queries based on the terms in each field. 
• Harder to accurately simulate user queries over synthetic data 
• Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some 
cached, some not), etc ... 
• Does the randomness in your test queries model (expected) user behavior? 
• Start with one server (1 shard) to determine baseline query performance. 
• Look for inefficiencies in your schema and other config settings
Query Performance, cont. 
• Higher risk of full GC pauses (facets, filters, sorting) 
• Use optimized data structures (DocValues) for facet / sort fields, Trie-based numeric fields for 
range queries, facet.method=enum for low cardinality fields 
• Check sizing of caches, esp. filterCache in solrconfig.xml 
• Add more replicas; load-balance; Solr can set HTTP headers to work with caching proxies like 
Squid 
• -Dhttp.maxConnections=## (default = 5, increase to accommodate more threads sending 
queries) 
• Avoid increasing ZooKeeper client timeout ~ 15000 (15 seconds) is about right 
• Don’t just keep throwing more memory at Java! –Xmx128G
Call me maybe - Jepsen 
https://github.com/aphyr/jepsen 
• Solr tests being developed by Lucene/Solr committer Shalin 
Mangar (@shalinmanger) 
• Prototype in place: 
• No ack’d writes were lost! 
• No un-ack’d writes succeeded 
See: https://github.com/LucidWorks/jepsen/tree/solr-jepsen
Solr Scale Toolkit 
• Open source: https://github.com/LucidWorks/solr-scale-tk 
• Fabric (Python) toolset for deploying and managing SolrCloud clusters in the cloud 
• Code to support benchmark tests (Pig script for data generation / indexing, JMeter samplers) 
• EC2 for now, more cloud providers coming soon via Apache libcloud 
• Contributors welcome! 
• More info: http://searchhub.org/2014/06/03/introducing-the-solr-scale-toolkit/
Provisioning cluster nodes 
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge 
• Custom built AMI (one for PV instances and one for HVM instances) – 
Amazon Linux 
• Block device mapping 
• dedicated disk per Solr node 
• Launch and then poll status until they are live 
• verify SSH connectivity 
• Tag each instance with a cluster ID and username
Deploy ZooKeeper ensemble 
fab new_zk_ensemble:zk1,n=3 
• Two options: 
• provision 1 to N nodes when you launch Solr cluster 
• use existing named ensemble 
• Fabric command simply creates the myid files and zoo.cfg file for the 
ensemble 
• and some cron scripts for managing snapshots 
• Basic health checking of ZooKeeper status: 
echo srvr | nc localhost 2181
Deploy SolrCloud cluster 
fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 
• Uses bin/solr in Solr 4.10 to control Solr nodes 
• Set system props: jetty.port, host, zkHost, JVM opts 
• One or more Solr nodes per machine 
• JVM mem opts dependent on instance type and # of Solr nodes 
per instance 
• Optionally configure log4j.properties to append messages to 
Rabbitmq for SiLK integration
Automate day-to-day cluster management tasks 
• Deploy a configuration directory to ZooKeeper 
• Create a new collection 
• Attach a local JConsole/VisualVM to a remote JVM 
• Rolling restart (with Overseer awareness) 
• Build Solr locally and patch remote 
• Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes 
from within the network 
• Put/get files 
• Grep over all log files (across the cluster)
Wrap-up and Q & A 
• LucidWorks: http://www.lucidworks.com -- We’re hiring! 
• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk 
• SiLK: http://www.lucidworks.com/lucidworks-silk/ 
• Solr In Action: http://www.manning.com/grainger/ 
• Connect: @thelabdude / tim.potter@lucidworks.com
Benchmarking Solr Performance at Scale

More Related Content

What's hot

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Lucidworks
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Lucidworks
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Lucidworks
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search PerformanceLucidworks (Archived)
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologyLucidworks
 
ApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr IntegrationApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr Integrationthelabdude
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 

What's hot (20)

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
ApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr IntegrationApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr Integration
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 

Similar to Benchmarking Solr Performance at Scale

Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy López
 
Container Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixContainer Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixDocker, Inc.
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10Anshum Gupta
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Bobby Curtis
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewLei (Harry) Zhang
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsConcentric Sky
 

Similar to Benchmarking Solr Performance at Scale (20)

Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Container Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixContainer Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, Netflix
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
 

Recently uploaded

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 

Recently uploaded (17)

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 

Benchmarking Solr Performance at Scale

  • 2. About Me • Lucene/Solr committer. Work for Lucidworks; focus on hardening SolrCloud, devops, big data architecture / deployments • Operated smallish cluster in AWS for Dachis Group (1.5 years ago, 18 shards ~900M docs) • Solr Scale Toolkit: Fabric/boto framework for deploying and managing clusters in EC2 • Co-author of Solr In Action with Trey Grainger
  • 3. Agenda 1. Quick review of the SolrCloud architecture 2. Indexing & Query performance tests 3. Solr Scale Toolkit (quick overview) 4. Q & A
  • 4. Solr in the wild … https://twitter.com/bretthoerner/status/476830302430437376
  • 5. SolrCloud distilled Subset of optional features in Solr to enable and simplify horizontal scaling a search index using sharding and replication. Goals performance, scalability, high-availability, simplicity, elasticity, and community-driven!
  • 6. Collection == distributed index A collection is a distributed index defined by: • named configuration stored in ZooKeeper • number of shards: documents are distributed across N partitions of the index • document routing strategy: how documents get assigned to shards • replication factor: how many copies of each document in the collection Collections API: curl "http://localhost:8983/solr/admin/collections? action=CREATE&name=logstash4solr&replicationFactor=2& numShards=2&collection.configName=logs"
  • 8. ZooKeeper • Is a very good thing ... clusters are a zoo! • Centralized configuration management • Cluster state management • Leader election (shard leader and overseer) • Overseer distributed work queue • Live Nodes • Ephemeral znodes used to signal a server is gone • Needs at least 3 nodes for quorum in production
  • 9. ZooKeeper: State Management • Keep track of live nodes /live_nodes znode • ephemeral nodes • ZooKeeper client timeout • Collection metadata and replica state in /clusterstate.json • Every Solr node has watchers for /live_nodes and /clusterstate.json • Leader election • ZooKeeper sequence number on ephemeral znodes
  • 10. Scalability Highlights • No split-brain problems (b/c of ZooKeeper) • All nodes in cluster perform indexing and execute queries; no master node • Distributed indexing: No SPoF, high throughput via direct updates to leaders, automated failover to new leader • Distributed queries: Add replicas to scale-out qps; parallelize complex query computations; fault-tolerance • Indexing / queries continue so long as there is 1 healthy replica per shard
  • 11. Cluster sizing How many servers do I need to index X docs? ... shards ... ? ... replicas ... ? I need N queries per second over M docs, how many servers do I need? It depends!
  • 12. Testing Methodology • Transparent repeatable results • Ideally hoping for something owned by the community • Synthetic docs ~ 1K each on disk, mix of field types • Data set created using code borrowed from PigMix • English text fields generated using a Zipfian distribution • Java 1.7u67, Amazon Linux, r3.2xlarge nodes • enhanced networking enabled, placement group, same AZ • Stock Solr (cloud) 4.10 • Using custom GC tuning parameters and auto-commit settings • Use Elastic MapReduce to generate indexing load • As many nodes as I need to drive Solr!
  • 13. Indexing Performance Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 10 10 1 48 1762 73,780 10 10 2 34 3727 34,881 10 20 1 48 1282 101,404 10 20 2 34 3207 40,536 10 30 1 72 1070 121,495 10 30 2 60 3159 41,152 15 15 1 60 1106 117,541 15 15 2 42 2465 52,738 15 30 1 60 827 157,195 15 30 2 42 2129 61,062
  • 15. Direct Updates to Leaders
  • 17. Indexing Performance Lessons • Solr has no built-in throttling support – will accept work until it falls over; need to build this into your indexing application logic • Oversharding helps parallelize indexing work and gives you an easy way to add more hardware to your cluster • GC tuning is critical (more below) • Auto-hard commit to keep transaction logs manageable • Auto soft-commit to see docs as they are indexed • Replication is expensive! (more work needed here)
  • 18. GC Tuning • Stop-the-world GC pauses can lead to ZooKeeper session expiration (which is bad) • More JVMs with smaller heap sizes are better! (12-16GB max per JVM ~ less if you can) • MMapDirectory relies on sufficient memory available to the OS cache (off-heap) • GC activity during Solr indexing is stable and generally doesn’t cause any stop-the-world collections … queries are a different story • Enable verbose GC logging (even in prod) so you can troubleshoot issues: -verbose:gc –Xloggc:gc.log -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime
  • 19. GC Flags I use with Solr -Xss256k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:MaxTenuringThreshold=8 -XX:NewRatio=3 -XX:CMSInitiatingOccupancyFraction=40 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=12m -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSTriggerPermRatio=80 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
  • 20. Sizing GC Spaces http://kumarsoablog.blogspot.com/2013/02/jvm-parameter-survivorratio_7.html
  • 21. Query Performance • Still a work in progress! • Sustained QPS & Execution time of 99th Percentile (coda hale metrics is good for this) • Stable: ~5,000 QPS / 99th at 300ms while indexing ~10,000 docs / sec • Using the TermsComponent to build queries based on the terms in each field. • Harder to accurately simulate user queries over synthetic data • Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some cached, some not), etc ... • Does the randomness in your test queries model (expected) user behavior? • Start with one server (1 shard) to determine baseline query performance. • Look for inefficiencies in your schema and other config settings
  • 22. Query Performance, cont. • Higher risk of full GC pauses (facets, filters, sorting) • Use optimized data structures (DocValues) for facet / sort fields, Trie-based numeric fields for range queries, facet.method=enum for low cardinality fields • Check sizing of caches, esp. filterCache in solrconfig.xml • Add more replicas; load-balance; Solr can set HTTP headers to work with caching proxies like Squid • -Dhttp.maxConnections=## (default = 5, increase to accommodate more threads sending queries) • Avoid increasing ZooKeeper client timeout ~ 15000 (15 seconds) is about right • Don’t just keep throwing more memory at Java! –Xmx128G
  • 23. Call me maybe - Jepsen https://github.com/aphyr/jepsen • Solr tests being developed by Lucene/Solr committer Shalin Mangar (@shalinmanger) • Prototype in place: • No ack’d writes were lost! • No un-ack’d writes succeeded See: https://github.com/LucidWorks/jepsen/tree/solr-jepsen
  • 24. Solr Scale Toolkit • Open source: https://github.com/LucidWorks/solr-scale-tk • Fabric (Python) toolset for deploying and managing SolrCloud clusters in the cloud • Code to support benchmark tests (Pig script for data generation / indexing, JMeter samplers) • EC2 for now, more cloud providers coming soon via Apache libcloud • Contributors welcome! • More info: http://searchhub.org/2014/06/03/introducing-the-solr-scale-toolkit/
  • 25. Provisioning cluster nodes fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge • Custom built AMI (one for PV instances and one for HVM instances) – Amazon Linux • Block device mapping • dedicated disk per Solr node • Launch and then poll status until they are live • verify SSH connectivity • Tag each instance with a cluster ID and username
  • 26. Deploy ZooKeeper ensemble fab new_zk_ensemble:zk1,n=3 • Two options: • provision 1 to N nodes when you launch Solr cluster • use existing named ensemble • Fabric command simply creates the myid files and zoo.cfg file for the ensemble • and some cron scripts for managing snapshots • Basic health checking of ZooKeeper status: echo srvr | nc localhost 2181
  • 27. Deploy SolrCloud cluster fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 • Uses bin/solr in Solr 4.10 to control Solr nodes • Set system props: jetty.port, host, zkHost, JVM opts • One or more Solr nodes per machine • JVM mem opts dependent on instance type and # of Solr nodes per instance • Optionally configure log4j.properties to append messages to Rabbitmq for SiLK integration
  • 28. Automate day-to-day cluster management tasks • Deploy a configuration directory to ZooKeeper • Create a new collection • Attach a local JConsole/VisualVM to a remote JVM • Rolling restart (with Overseer awareness) • Build Solr locally and patch remote • Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes from within the network • Put/get files • Grep over all log files (across the cluster)
  • 29. Wrap-up and Q & A • LucidWorks: http://www.lucidworks.com -- We’re hiring! • Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk • SiLK: http://www.lucidworks.com/lucidworks-silk/ • Solr In Action: http://www.manning.com/grainger/ • Connect: @thelabdude / tim.potter@lucidworks.com

Editor's Notes

  1. Brett is at Spredfast (ATX), 12-hr sharding scheme (180 shards)
  2. ZooKeeper: Distributed coordination service that provides centralized configuration, cluster state management, and leader election Node: JVM process bound to a specific port on a machine; hosts the Solr web application Collection: Search index distributed across multiple nodes; each collection has a name, shard count, and replication factor Replication Factor: Number of copies of a document in a collection Shard: Logical slice of a collection; each shard has a name, hash range, leader, and replication factor. Documents are assigned to one and only one shard per collection using a hash-based document routing strategy. Replica: Solr index that hosts a copy of a shard in a collection; behind the scenes, each replica is implemented as a Solr core Leader: Replica in a shard that assumes special duties needed to support distributed indexing in Solr; each shard has one and only one leader at any time and leaders are elected using ZooKeeper
  3. You’re not going to tune your way out of every query problem!