More Related Content Similar to Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine Similar to Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine (20) More from Lucidworks (Archived) More from Lucidworks (Archived) (20) Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine1. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
This Ain’t Your Parents’ Search Engine
Grant Ingersoll
CTO, LucidWorks
Twitter: @gsingers
4. Confidential and Proprietary © Copyright 2013
Search is good for…
• Traditional: Fast, fuzzy text matching
across a large document collection
• De-normalized data
- “light” relational
• Top N problems
- Key-value (n=1)
- Recommendations
- “Good enough” classification,
clustering
• Faceting, aggregations, analytical
slicing and dicing of data
• Spatial, record/event linkage, alerting
http://cheezburger.com/5243950080
5. Confidential and Proprietary © Copyright 2013
Foundational Changes in Lucene/Solr 4
•Reduced Memory usage
•Pluggable Codecs/similarity
•FS(A|T)
•Doc Values (column oriented)
•Spatial upgrade
•New facets and functions
•Cursors (deep paging)
•Distributed capabilities
•Joins/Grouping
6. Confidential and Proprietary © Copyright 2013
Search + Hadoop
•What’s Old is New Again
•“Traditional” Use Cases:
- Build/Store indexes
- https://cwiki.apache.org/confluence/display/solr/
Running+Solr+on+HDFS
•Enrichment and Signal processing
- PageRank, Statistically Interesting Phrases, etc.
7. Confidential and Proprietary © Copyright 2013
LucidWorks + Hadoop
•Ingestion Help
- Flexible Map-Reduce content ingestion supporting:
»Directory of files
»CSV, Writable, etc.
»LogStash
»Build Your Own
•Pig Load/Store and UDFs
•Hive 2-way support
•http://www.lucidworks.com/search-for-
hadoop/
- Open source this summer
8. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks SiLK
LucidWorks Search
JDBC
Connector
Web/File
System Crawl
Data
Warehouse
Hadoop
Connectors
Clickstream Networking
Data Sources
Connectors
Servers
9. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway
(Reverse Proxy)
Solr Output
Writer for
LogStash (Http)
Search Logs
Visualization
Configurable Dashboards
Hadoop Connector
GrokIngestMapperLogStash
10. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks Open Source
• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager
• Banana (Kibana for Solr): https://github.com/LucidWorks/banana
• Effortless AWS deployment and monitoring:
http://www.github.com/lucidworks/solr-scale-tk
• Data Quality Toolkit: https://github.com/LucidWorks/data-quality
12. Confidential and Proprietary © Copyright 2013
12
Fly the friendly skies
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
13. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Make $$$
• Leverage time series
data and visualization
using LucidWorks SiLK
• Monitor Social
• Traditional Research
https://github.com/lucidworks/lws-financial-demo
15. Confidential and Proprietary © Copyright 2013
15
Space-Time Continuum
• Leverage Solr’s spatial
capabilities to index non-
spatial data, such as time
ranges
- Useful for Open Hours,
Shifts, etc.
• Query using rectangle
intersections
- q = shift:"Intersects(0 19
23 365)”
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
16. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Signal Processing for Search and Discovery
• Signals power modern relevance
– Clicks, conversions, sharing, history, signatures
• LucidWorks 5 makes it easy to capture and
leverage signals
– Recommendations, analytics, discovery
• Simplifies your data workflow
• Simplify your operational footprint
17. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr Powered Signal Processing
• Use Case: eCommerce
• Data:
– Product catalog (~1.2m items)
– Click data (~3.9M clicks)
18. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Meta
• http://www.lucidworks.com
– grant@lucidworks.com
– @gsingers
• Sales
– Steve Drane (based here in Chicago)
– steve.drane@lucidworks.com
• Lucene/Solr Revolution
– Washington DC, Nov 11-14
– http://www.lucenerevolution.org
Editor's Notes I chose LogStash for data transformation and import for two reasons:
It provides a powerful framework for extracting, grokking and transforming log data into a structured format that Solr can consume and that SILK can use for dashboards.
LucidWorks’ Hadoop Connectors have a GrokIngestMapper that allows me to reuse the same LogStash Filters to work with larger volumes of files on HDFS (more details on this in a future article).
Highlights: Joins, stats, pivot faceting http://localhost:3334/#/dashboard/solr/Trading
Time series, joins TARDIS: http://2.bp.blogspot.com/-ysN8JskY4WM/UEZNhBywQKI/AAAAAAAABdg/gXE0A9OO6Mk/s1600/13881_doctor_who.jpg
Work under way to formalize
but not as a search engine for content
more like a search engine for behavior