See some common myths, discover the various open source enterprise search packages available and see some case studies on how open source software has helped organisations build effective search.
1. Open Source Search for the
Enterprise
Charlie Hull
Managing Director, Flax
3rd
November 2010
OVUM Briefing, Search Across the Enterprise
charlie@flax.co.uk
www.flax.co.uk/blog
+44 (0) 8700 118334
Twitter: @FlaxSearch
2. Search engine specialists with decades of experience
Developers, innovators and strategists
Based in Cambridge, UK
Technology agnostic – but open source exponents
Recently selected as UK Authorized Partner by Lucid
Imagination
Customers include Mydeco, NLA, Durrants Ltd, Financial
Times, MediaMiser, MySkreen, Accenture, University of
Cambridge
Recently asked to present at British Computer Society
and Lucene Revolution conferences
Who are Flax?
3. “Open-source software (OSS) is computer
software that is available in source code form
for which the source code and certain other
rights normally reserved for copyright holders
are provided under a software license that
permits users to study, change, and improve
the software. […] Some open source software is
available within the public domain” (Wikipedia)
What is open source?
4. “Open-source software (OSS) is computer
software that is available in source code form
for which the source code and certain other
rights normally reserved for copyright holders
are provided under a software license that
permits users to study, change, and improve
the software. […] Some open source software is
available within the public domain” (Wikipedia)
What is open source?
5. It's the work of amateur developers
Myths about open source
6. It's the work of amateur developers
If I use open source, I have to open up my
software/servers/network to all and sundry
Myths about open source
7. It's the work of amateur developers
If I use open source, I have to open up my
software/servers/network to all and sundry
Open source software isn't reliable or
scalable
Myths about open source
8. It's the work of amateur developers
If I use open source, I have to open up my
software/servers/network to all and sundry
Open source software isn't reliable or
scalable
It's free
Myths about open source
9. It's the work of amateur developers
If I use open source, I have to open up my
software/servers/network to all and sundry
Open source software isn't reliable or
scalable
It's free
It's unsupported
Myths about open source
10. Open source search software
Apache Lucene and Solr are trademarks of The Apache Software Foundation
- Flexible licensing
- Vector space model
- Java and other languages
- Well known and supportedApache Lucene and Solr are trademarks of The Apache Software Foundation
11. Open source search software
Apache Lucene and Solr are trademarks of The Apache Software Foundation
- The successor to Muscat
- Bayesian probabilistic ranking
- C/C++ with language bindings
- Highly accurate & scalable
- Flexible licensing
- Vector space model
- Java and other languages
- Well known and supportedApache Lucene and Solr are trademarks of The Apache Software Foundation
12. Open source search software
Apache Lucene and Solr are trademarks of The Apache Software Foundation
- The successor to Muscat
- Bayesian probabilistic ranking
- C/C++ with language bindings
- Highly accurate & scalable
- Flexible licensing
- Vector space model
- Java and other languages
- Well known and supported
And more....
Apache Lucene and Solr are trademarks of The Apache Software Foundation
13. Some examples
http://www.nla-clipshare.com
Newspaper Licensing Agency – NLA Clipshare
20 million newspaper stories
6500 users
Content from every major newspaper (and
most regionals)
Used by journalists, clippings agencies,
media monitors
Replacing internal systems at major
newspapers
14. Some examples
http://www.nla-clipshare.com
Newspaper Licensing Agency – NLA Clipshare
20 million newspaper stories
6500 users
Content from every major newspaper (and
most regionals)
Used by journalists, clippings agencies,
media monitors
Replacing internal systems at major
newspapers
One of very few ways to search content
from all the papers within hours of
publication
15.
16.
17.
18. Some examples
Financial Times – press cuttings
Web Service for easy integration
XML source data
Faceted search
Area filters (whole article, body, headline,
byline or any combination)
Synonyms, spelling suggestions
http://presscuttings.ft.com
19. Some examples
Financial Times – press cuttings
Web Service for easy integration
XML source data
Faceted search
Area filters (whole article, body, headline,
byline or any combination)
Synonyms, spelling suggestions
Built from scratch in a fortnight
Designed as a prototype, scaled to
production use without significant change
http://presscuttings.ft.com
20.
21. Some examples
Durrants Ltd. Media monitoring platform
Thousands of client search profiles
Hundreds of thousands of articles per day
Complex publication heirarchy
Established pipeline
Solution
Flexible query language allows OCR
errors, punctuation, fuzzy matching,
weighting
Supports features of previous engine
Scalable master-slave architecture
22. Some examples
Durrants Ltd. Media monitoring platform
Thousands of client search profiles
Hundreds of thousands of articles per day
Complex publication heirarchy
Established pipeline
Solution
Flexible query language allows OCR
errors, punctuation, fuzzy matching,
weighting
Supports features of previous engine
Scalable master-slave architecture
Accuracy improved in some cases from 95%
rejected to 95% accepted
Hardware budget 15% of previous system
23. Some examples
(Unnamed multinational radio suppliers)
Intranet search
12 million documents
Multiple formats – Office, PDF, HTML...
User and group-based security (LDAP)
Faceted search
Users can 'tag' interesting documents – for
example to identify a 'reference' version
24. Some examples
(Unnamed multinational radio suppliers)
Intranet search
12 million documents
Multiple formats – Office, PDF, HTML...
User and group-based security (LDAP)
Faceted search
Users can 'tag' interesting documents – for
example to identify a 'reference' version
Open source chosen because of significant
cost advantage – commercial solutions
uneconomic at this scale
25. A look at Lucene & Solr
Among the top 15 open source projects
Installations at over 4,000 companies
Downloads have grown nearly 10x over the past three
years
Over 7,000 downloads a day.
26. A look at Lucene & Solr
Among the top 15 open source projects
Installations at over 4,000 companies
Downloads have grown nearly 10x over the past three
years
Over 7,000 downloads a day.
USA based
Employs 9 out of 15 top Lucene committers
Offers training, consulting and up to 24x7
support
Developing value-add software
27. A look at Lucene & Solr
Among the top 15 open source projects
Installations at over 4,000 companies
Downloads have grown nearly 10x over the past three
years
Over 7,000 downloads a day.
USA based
Employs 9 out of 15 top Lucene committers
Offers training, consulting and up to 24x7
support
Developing value-add software
Flax are UK partners & resellers
30. Some Lucene & Solr numbers
LinkedIn – 30 million users
Internet Archive – a billion indexed pages
Salesforce.com – 8 terabytes of searchable data
Twitter – a billion queries a day
32. Why open source search?
Flexible, extendable
Powerful & scalable
33. Why open source search?
Flexible, extendable
Powerful & scalable
Lower cost, especially when planning for growth
34. Why open source search?
Flexible, extendable
Powerful & scalable
Lower cost, especially when planning for growth
Commercial support available as necessary
35. Why open source search?
Flexible, extendable
Powerful & scalable
Lower cost, especially when planning for growth
Commercial support available as necessary
- Freedom to innovate
37. Looking to the future
More and more content including social media
38. Looking to the future
More and more content including social media
Multiple delivery platforms
39. Looking to the future
More and more content including social media
Multiple delivery platforms
Search-powered applications
40. Looking to the future
More and more content including social media
Multiple delivery platforms
Search-powered applications
Cloud computing
41. Looking to the future
More and more content including social media
Multiple delivery platforms
Search-powered applications
Cloud computing
More use of entity extraction & sentiment analysis
42. Looking to the future
More and more content including social media
Multiple delivery platforms
Search-powered applications
Cloud computing
More use of entity extraction & sentiment analysis
Search no longer a bolt-on, but a
platform for innovation
43. Looking to the future
More and more content including social media
Multiple delivery platforms
Search-powered applications
Cloud computing
More use of entity extraction & sentiment analysis
Search no longer a bolt-on, but a
platform for innovation
Open source no longer an outsider,
but the obvious choice