SlideShare a Scribd company logo
1 of 25
Download to read offline
Transforming the house hunting experience

       Alex Burmester, Alexander Kanarsky, Trulia
About Us

 Trulia: Founded in 2005
 Technology company in downtown San
  Francisco transforming real estate search.
 14M unique visitors per month
 Large and active user community
 Strong industry relationships: agents, brokers,
  over 500k registered professionals
 Helping consumers make informed decisions
 About the speakers


                                                    2
Trulia’s Marketplace

Consumers                                              Professionals
• $1Tr Real Estate                                      • $20Bn Real Estate Mktg
  Economy                                               • $7Bn Online Today

• ~5M Home sales/yr           MARKETPLACE               • >1M Realtors
• ~16M Leases/yr                                        • >1M Landlords/Managers
• >70M Homeowners                                       • >1M Local Contractors




 Most Compelling                                        Essential Marketing
      Product:                                              Solution:
  Comprehensive                                       Huge exposure, qualified
    data, unique                                      customers, powerful tools
 insights, great UX

           We create value by connecting consumers and professionals
Problem overview

 Real Estate search specifics
 Traditional Search: Agent based/MLS
 No single MLS country wide
 Largest financial transaction for most people
  with potential for huge life tradeoffs
 Lots of data but silo'd off with data quality
  issues and differing disclosure rules by area.
The House Hunting Process

 Buying, investing, renting
 How online search changes this
 Trulia was first site to mash-up lots of data
  nationwide, maps, heatmaps, tax info, rentals,
  street-view
 Focus on speed
 Relevancy of data
 Amount of information
 Targeted local RE search experience
Challenges

   Integrating large diverse datasets
   Data quality and freshness
   Delivering the right data to the right people
   Scaling for growth
   Meaning of Location, location, location
Legacy Approach

   MLS-like database search
   MySQL 4.x
   Limitations on data processing
   Limitations on scalability
   Replication problems
   Update speed problems
Why Solr?
 Fast, flexible attribute search, faceting
 Non-uniform data handling, dynamic schema
 Easy, uniform API for indexing, query
 Excellent Full-Text Search
 Indexing, Search scale great (Hadoop indexing,
  distributed queries, replication)
 Stored content: can be used as a data store
 Add-Ons: Geospatial search, Field Collapsing,
  Automatic Data Import via DIH
 Strong Developers/Users Community
Hadoop Indexing

  Based on SOLR-1301 patch

  Reduce-side indexing

  Map-side partitioning (sharding)

  Input Data in HDFS

  Output Indexes in HDFS

  Local FS for temporary data processing

  120+ mil. documents index in less than 1 hour

  Paired with Index Manager
Hadoop Indexing

                                              HDFS
                                  Mappers       Part 1
        Partitioners                            Part 2
                                                Part 3
                                                Part 4
                                                …
                                                Part N


                                             Input data

                       Embedded
                         Solr     Reducers
                                                Shard 1
Local                                           ...
 FS                                             Shard K




                                             Output data
Index management

  Index Manager: custom controller

  Manages index deployment from HDFS

  Controls load balancing

  Manages forced replication on slaves

  Interacts with caching layer

  Handles Direct indexing

  Visualizes the state of the indexes

  Alternatives: Katta, Solr Cloud
Index Management

                                                                                Search
      HDFS                                                                     Requests


        Shard 1
        Shard 2
        Shards 3

                                                                          Caching/Load Balancing Layer


                                                   Master                 Slave 1             Slave N
Pulls data
from HDFS                                                                               ...
                                                    Shard 1                   Shard 1            Shard 1
                    Index                           Shard 2                   Shard 2            Shard 2
                   Manager                          Shard 3                   Shard 3            Shard 3
                             Installs
                             Master Index




                                            Controls slaves replication
Direct Indexer

  Handles various updates between the batches

  Patches, pictures updates etc.

  Only does certain type of documents

  Directly updates master index shards

  Interacts with Index Manager (scope of
changes)

    Runs on a regular basis
Direct Indexer
                                   Data Sources
                                                                                  Search
      HDFS                                                                       Requests


        Shard 1                      Direct Indexer
        Shard 2
        Shards 3

                                                                            Caching/Load Balancing Layer


                                                         Master             Slave 1             Slave N
Pulls data
from HDFS                                                                                 ...
                                                      Shard 1                   Shard 1            Shard 1
                    Index                             Shard 2                   Shard 2            Shard 2
                   Manager                            Shard 3                   Shard 3            Shard 3
                               Installs
                               Master Index




                                              Controls slaves replication
Distributed Search

Query Aggregator uses the same partitioning schema
as Hadoop Indexer, so only the relevant shards are
queried. Query extra parameters define the scope.

            Query 1                       Shard 1
Requests                  Query
                                            ...
                        Aggregator
                                          Shard N




            Query 2                       Shard 1
                          Query
Requests                Aggregator
                                            ...
                                          Shard N
Distributed Search

  Many servers, many shards                Aggregated Query time
                                                distribution

  Request is sent only to       100%



relevant shards                 90%




  For search I/O wait matters   80%




  So SSDs rule :-)
                                70%


                                60%

  30-50 qps per shard                                                   >1s
                                50%                                     <1s

  90% of queries < 100 ms,      40%
                                                                        <100
                                                                        <10


  99% within 1sec               30%


  Separate indexes for some     20%

documents                       10%


  50% of queries < 10 ms         0%
                                       6     7   8   9   10   11   12

  Caching/load balancing
                                       Time period: 6 am to 12 pm
matters
Geospatial search

    Local search is super important

    Mobile search is naturally spatial-oriented

    Communities, neighborhoods, enclaves

    POI search

    Comparable Properties

    Personalized Search: Factoring census data in

    How to enable it with Solr?

    No standard solution yet
Geospatial Solr/Lucene

    SOLR-733 patch

    Local Lucene / Local Solr (Patrick O'Leary)

    Spatial Lucene in Lucene 3.1

    JTeam's SSP (Chris Male)

    SOLR-2155 patch (David Smiley)

    Lucene-Spatial-Playground (David Smiley, Chris
    Male, Ryan McKinley)

    Spatial functions is Solr 3.1
Geospatial search


  What kind of spatial search do we need?

  Radius search

  Polygon search

  Sort by distance is needed for radius search

  Implementation chosen: SSP-based, fast
geometry

   INFO: [active] webapp= path=/select/ params={fl=streetAddress_s,lat,lng&wt=kml&q={!spatial+polygons%3D37.77960208641734,-
122.43867874145508;37.783265262376574,-122.43945121765137;37.78414711095678,-122.4330997467041;37.788827515747435,-
122.43404388427734;37.79038758480464,-122.4221134185791;37.781908551707666,-122.420654296875;37.774988939930005,-
122.42932319641113;37.774988939930005,-122.4360179901123;37.77817746896081,-122.43687629699707;37.777634750327046,-
122.43867874145508;37.77960208641734,-122.43867874145508;}*:*&rows=1000} hits=75 status=0 QTime=6
Geospatial Search

  Circle search: given (lat, lng,
radius) return list of docs with
lat/lng indexed

  Sorting: distance from the
center

  Polygon search: given a
polygon (set of lat/lng pairs)
return list of docs with lat/lng
indexed

  Bounding boxes: lat,lng
range queries

  Postponed distance
calculation (query response
write time)
Geospatial Search


    Spatial query parser: {!spatial <params>}

    Can be wrapped as a filter query

    Facet by spatial queries

    Planar geometry: fast, but has limitations

    Good for our use cases, YMMV

    Polygon compression: Google's delta-encoding

    TBD: shapes indexing
Wrap Up
 We are reshaping real estate search - with the right
  tools!
 Hadoop + Solr way is great for frequent batch
  indexing
 Solr Search is fast - and highly scalable through
  sharding and replication
 Geospatial Solr search is still not standardized,
  lots of progress in development
 Open Source: Trulia contributes



                                                         22
Links
 SOLR-1301: Hadoop Indexing Contrib
  • issues.apache.org/jira/browse/SOLR-1301
 Spatial Solr Plugin (SSP)
  • www.jteam.nl/products/spatialsolrplugin.html
 SSP Polygons/Polylines Extension
  • sourceforge.net/projects/ssplex/files/




                                                   23
Contact Information
 Trulia
   • www.trulia.com
   • www.linkedin.com/company/trulia
 Alex Burmester
   • aburmester@trulia.com
 Alexander Kanarsky
   • akanarsky@trulia.com




                                       24
The End


    Q&A

    Trulia is hiring!
     – www.trulia.com/jobs
     – Work in downtown San Francisco with great people
     – Looking for engineers specializing in search, distributed data
       processing, visualization/data science, mobile platforms, front
       end and more.

    Thank you for coming!

More Related Content

Viewers also liked

Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlersLucidworks (Archived)
 
The scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena GomezThe scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena Gomeztanica
 
The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketPaul Williamson
 
Civil War
Civil WarCivil War
Civil Wartanica
 
IE12 大予想
IE12 大予想IE12 大予想
IE12 大予想彰 村地
 
基于成本代理模型的Ip长途网络成本仿真研究
基于成本代理模型的Ip长途网络成本仿真研究基于成本代理模型的Ip长途网络成本仿真研究
基于成本代理模型的Ip长途网络成本仿真研究sjm44
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Speed Up Web 2012
Speed Up Web 2012Speed Up Web 2012
Speed Up Web 2012彰 村地
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemLucidworks (Archived)
 
情報科学演習 09
情報科学演習 09情報科学演習 09
情報科学演習 09libryukyu
 

Viewers also liked (16)

Ashe
AsheAshe
Ashe
 
Solr 3.1 and beyond
Solr 3.1 and beyondSolr 3.1 and beyond
Solr 3.1 and beyond
 
Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010Lucene rev preso busch realtime search lr1010
Lucene rev preso busch realtime search lr1010
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
Integration of apache solr with crawlers
Integration of apache solr with crawlersIntegration of apache solr with crawlers
Integration of apache solr with crawlers
 
The scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena GomezThe scene- I love you like a love song Selena Gomez
The scene- I love you like a love song Selena Gomez
 
The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the market
 
Civil War
Civil WarCivil War
Civil War
 
IE12 大予想
IE12 大予想IE12 大予想
IE12 大予想
 
基于成本代理模型的Ip长途网络成本仿真研究
基于成本代理模型的Ip长途网络成本仿真研究基于成本代理模型的Ip长途网络成本仿真研究
基于成本代理模型的Ip长途网络成本仿真研究
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Speed Up Web 2012
Speed Up Web 2012Speed Up Web 2012
Speed Up Web 2012
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search Problem
 
情報科学演習 09
情報科学演習 09情報科学演習 09
情報科学演習 09
 
Web Design Course FETAC Level 5
Web Design Course FETAC Level 5 Web Design Course FETAC Level 5
Web Design Course FETAC Level 5
 

Similar to Transforming the house hunting experience

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
Laserdata i skyen - Geomatikkdagene 2013
Laserdata i skyen - Geomatikkdagene 2013Laserdata i skyen - Geomatikkdagene 2013
Laserdata i skyen - Geomatikkdagene 2013Geodata AS
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache AccumuloJared Winick
 
Webinar: General Technical Overview of MongoDB
Webinar: General Technical Overview of MongoDBWebinar: General Technical Overview of MongoDB
Webinar: General Technical Overview of MongoDBMongoDB
 
Sharding Architectures
Sharding ArchitecturesSharding Architectures
Sharding Architecturesguest0e6d5e
 
Federated HDFS
Federated HDFSFederated HDFS
Federated HDFShuguk
 
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...kcitp
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­TimeSeven Nguyen
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
Decade architecture discussion 20110311
Decade architecture discussion 20110311Decade architecture discussion 20110311
Decade architecture discussion 20110311chenlijiang
 
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)Sharad Agarwal
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopGeorge Ang
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 

Similar to Transforming the house hunting experience (20)

ElephantDB
ElephantDBElephantDB
ElephantDB
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
Laserdata i skyen - Geomatikkdagene 2013
Laserdata i skyen - Geomatikkdagene 2013Laserdata i skyen - Geomatikkdagene 2013
Laserdata i skyen - Geomatikkdagene 2013
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Webinar: General Technical Overview of MongoDB
Webinar: General Technical Overview of MongoDBWebinar: General Technical Overview of MongoDB
Webinar: General Technical Overview of MongoDB
 
Sharding Architectures
Sharding ArchitecturesSharding Architectures
Sharding Architectures
 
Federated HDFS
Federated HDFSFederated HDFS
Federated HDFS
 
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­Time
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Decade architecture discussion 20110311
Decade architecture discussion 20110311Decade architecture discussion 20110311
Decade architecture discussion 20110311
 
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using Hadoop
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Hadoop Inside
Hadoop InsideHadoop Inside
Hadoop Inside
 

More from Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Transforming the house hunting experience

  • 1. Transforming the house hunting experience Alex Burmester, Alexander Kanarsky, Trulia
  • 2. About Us  Trulia: Founded in 2005  Technology company in downtown San Francisco transforming real estate search.  14M unique visitors per month  Large and active user community  Strong industry relationships: agents, brokers, over 500k registered professionals  Helping consumers make informed decisions  About the speakers 2
  • 3. Trulia’s Marketplace Consumers Professionals • $1Tr Real Estate • $20Bn Real Estate Mktg Economy • $7Bn Online Today • ~5M Home sales/yr MARKETPLACE • >1M Realtors • ~16M Leases/yr • >1M Landlords/Managers • >70M Homeowners • >1M Local Contractors Most Compelling Essential Marketing Product: Solution: Comprehensive Huge exposure, qualified data, unique customers, powerful tools insights, great UX We create value by connecting consumers and professionals
  • 4. Problem overview  Real Estate search specifics  Traditional Search: Agent based/MLS  No single MLS country wide  Largest financial transaction for most people with potential for huge life tradeoffs  Lots of data but silo'd off with data quality issues and differing disclosure rules by area.
  • 5. The House Hunting Process  Buying, investing, renting  How online search changes this  Trulia was first site to mash-up lots of data nationwide, maps, heatmaps, tax info, rentals, street-view  Focus on speed  Relevancy of data  Amount of information  Targeted local RE search experience
  • 6. Challenges  Integrating large diverse datasets  Data quality and freshness  Delivering the right data to the right people  Scaling for growth  Meaning of Location, location, location
  • 7. Legacy Approach  MLS-like database search  MySQL 4.x  Limitations on data processing  Limitations on scalability  Replication problems  Update speed problems
  • 8. Why Solr?  Fast, flexible attribute search, faceting  Non-uniform data handling, dynamic schema  Easy, uniform API for indexing, query  Excellent Full-Text Search  Indexing, Search scale great (Hadoop indexing, distributed queries, replication)  Stored content: can be used as a data store  Add-Ons: Geospatial search, Field Collapsing, Automatic Data Import via DIH  Strong Developers/Users Community
  • 9. Hadoop Indexing  Based on SOLR-1301 patch  Reduce-side indexing  Map-side partitioning (sharding)  Input Data in HDFS  Output Indexes in HDFS  Local FS for temporary data processing  120+ mil. documents index in less than 1 hour  Paired with Index Manager
  • 10. Hadoop Indexing HDFS Mappers Part 1 Partitioners Part 2 Part 3 Part 4 … Part N Input data Embedded Solr Reducers Shard 1 Local ... FS Shard K Output data
  • 11. Index management  Index Manager: custom controller  Manages index deployment from HDFS  Controls load balancing  Manages forced replication on slaves  Interacts with caching layer  Handles Direct indexing  Visualizes the state of the indexes  Alternatives: Katta, Solr Cloud
  • 12. Index Management Search HDFS Requests Shard 1 Shard 2 Shards 3 Caching/Load Balancing Layer Master Slave 1 Slave N Pulls data from HDFS ... Shard 1 Shard 1 Shard 1 Index Shard 2 Shard 2 Shard 2 Manager Shard 3 Shard 3 Shard 3 Installs Master Index Controls slaves replication
  • 13. Direct Indexer  Handles various updates between the batches  Patches, pictures updates etc.  Only does certain type of documents  Directly updates master index shards  Interacts with Index Manager (scope of changes)  Runs on a regular basis
  • 14. Direct Indexer Data Sources Search HDFS Requests Shard 1 Direct Indexer Shard 2 Shards 3 Caching/Load Balancing Layer Master Slave 1 Slave N Pulls data from HDFS ... Shard 1 Shard 1 Shard 1 Index Shard 2 Shard 2 Shard 2 Manager Shard 3 Shard 3 Shard 3 Installs Master Index Controls slaves replication
  • 15. Distributed Search Query Aggregator uses the same partitioning schema as Hadoop Indexer, so only the relevant shards are queried. Query extra parameters define the scope. Query 1 Shard 1 Requests Query ... Aggregator Shard N Query 2 Shard 1 Query Requests Aggregator ... Shard N
  • 16. Distributed Search  Many servers, many shards Aggregated Query time distribution  Request is sent only to 100% relevant shards 90%  For search I/O wait matters 80% So SSDs rule :-) 70%  60%  30-50 qps per shard >1s 50% <1s  90% of queries < 100 ms, 40% <100 <10  99% within 1sec 30%  Separate indexes for some 20% documents 10%  50% of queries < 10 ms 0% 6 7 8 9 10 11 12  Caching/load balancing Time period: 6 am to 12 pm matters
  • 17. Geospatial search  Local search is super important  Mobile search is naturally spatial-oriented  Communities, neighborhoods, enclaves  POI search  Comparable Properties  Personalized Search: Factoring census data in  How to enable it with Solr?  No standard solution yet
  • 18. Geospatial Solr/Lucene  SOLR-733 patch  Local Lucene / Local Solr (Patrick O'Leary)  Spatial Lucene in Lucene 3.1  JTeam's SSP (Chris Male)  SOLR-2155 patch (David Smiley)  Lucene-Spatial-Playground (David Smiley, Chris Male, Ryan McKinley)  Spatial functions is Solr 3.1
  • 19. Geospatial search  What kind of spatial search do we need?  Radius search  Polygon search  Sort by distance is needed for radius search  Implementation chosen: SSP-based, fast geometry  INFO: [active] webapp= path=/select/ params={fl=streetAddress_s,lat,lng&wt=kml&q={!spatial+polygons%3D37.77960208641734,- 122.43867874145508;37.783265262376574,-122.43945121765137;37.78414711095678,-122.4330997467041;37.788827515747435,- 122.43404388427734;37.79038758480464,-122.4221134185791;37.781908551707666,-122.420654296875;37.774988939930005,- 122.42932319641113;37.774988939930005,-122.4360179901123;37.77817746896081,-122.43687629699707;37.777634750327046,- 122.43867874145508;37.77960208641734,-122.43867874145508;}*:*&rows=1000} hits=75 status=0 QTime=6
  • 20. Geospatial Search  Circle search: given (lat, lng, radius) return list of docs with lat/lng indexed  Sorting: distance from the center  Polygon search: given a polygon (set of lat/lng pairs) return list of docs with lat/lng indexed  Bounding boxes: lat,lng range queries  Postponed distance calculation (query response write time)
  • 21. Geospatial Search  Spatial query parser: {!spatial <params>}  Can be wrapped as a filter query  Facet by spatial queries  Planar geometry: fast, but has limitations  Good for our use cases, YMMV  Polygon compression: Google's delta-encoding  TBD: shapes indexing
  • 22. Wrap Up  We are reshaping real estate search - with the right tools!  Hadoop + Solr way is great for frequent batch indexing  Solr Search is fast - and highly scalable through sharding and replication  Geospatial Solr search is still not standardized, lots of progress in development  Open Source: Trulia contributes 22
  • 23. Links  SOLR-1301: Hadoop Indexing Contrib • issues.apache.org/jira/browse/SOLR-1301  Spatial Solr Plugin (SSP) • www.jteam.nl/products/spatialsolrplugin.html  SSP Polygons/Polylines Extension • sourceforge.net/projects/ssplex/files/ 23
  • 24. Contact Information  Trulia • www.trulia.com • www.linkedin.com/company/trulia  Alex Burmester • aburmester@trulia.com  Alexander Kanarsky • akanarsky@trulia.com 24
  • 25. The End  Q&A  Trulia is hiring! – www.trulia.com/jobs – Work in downtown San Francisco with great people – Looking for engineers specializing in search, distributed data processing, visualization/data science, mobile platforms, front end and more.  Thank you for coming!