SlideShare a Scribd company logo
1 of 27
Download to read offline
Interactive Q&A
10th December 2020
Question 1
General Considerations
Language Models
https://en.wikipedia.org/wiki/BERT_(language_model)
https://en.wikipedia.org/wiki/GPT-3
https://rajpurkar.github.io/SQuAD-explorer/
• Pre Trained on large corpora (expensive)
• Ad hoc fine tuning to solve Natural Language Tasks (inexpensive)
• Ability to encode terms and sentences as high dimensional vectors
e.g.
https://github.com/google-research/bert#pre-trained-models
https://github.com/hanxiao/bert-as-service/
Bert vectors for sentences [‘access the bank', ‘walking by the street', ‘tigers are big cats'] :
[[ 0.13186474 0.32404128 -0.82704437 ... -0.3711958 -0.39250174
-0.31721866]
[ 0.24873531 -0.12334424 -0.38933852 ... -0.44756213 -0.5591355
-0.11345179]
[ 0.28627345 -0.18580122 -0.30906814 ... -0.2959366 -0.39310536
0.07640187]]
General Considerations
Language Models in Search
• Indexing Time : encode sentences (or full field contents) and store the vectors
• Searching Time: encode the query
• Score the query-document vectors pair, calculating vector distance/similarity:
Euclidean distance
Cosine Similarity
…
Limitations
• Rank entire corpus of documents ? Apply an (Approximate) Nearest Neighbour approach?
• Performance for embedding extraction?
• Un-intuitive results -> should be combined with Traditional Information Retrieval
• Explainability
Apache Lucene
Ideally you want to avoid scoring all documents of your corpus for your query.
The algorithms for vector retrieval can be roughly classified into four categories,
1. Tree-base algorithms, such as KD-tree;
2. Hashing methods, such as LSH (Local Sensitive Hashing);
3. Product quantization based algorithms, such as IVFFlat;
4. Graph-base algorithms, such as HNSW, SSG, NSG;
Specific File Format (Nov 2020)
•https://issues.apache.org/jira/browse/LUCENE-9004
Hierarchical Navigable Small World Graphs - DONE
•https://issues.apache.org/jira/browse/LUCENE-9322
DONE Unified Vector Format
•https://issues.apache.org/jira/browse/LUCENE-9136
IVFFlat - In Progress
Apache Lucene
Follow-ups
- reducing heap usage during graph construction
- adding a Query implementation
- exposing index hyper-parameters
- benchmarks
- testing on public datasets
- implementing a diversity heuristic for neighbour selection during graph construction
- making the graph hierarchical
- exploring more efficient search across multiple per-segment graphs…
Keep an eye on Lucene JIRA!
https://issues.apache.org/jira/browse/LUCENE-9004
Apache Solr
Status of Deep Learning Vector Based Search
• Lucene latest codecs and file format not used yet
https://issues.apache.org/jira/browse/SOLR-14397 -> develop an official solution out of the box
https://issues.apache.org/jira/browse/SOLR-12890 -> summary
Ready to use Approaches
• Vector Scoring using Streaming Expressions (Point Fields)
• Available Solr Vector Search Plugin - https://github.com/saaay71/solr-vector-scoring (Payloads)
https://medium.com/@dmitry.kan/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559
• Available Solr Vector Search Plugin with LSH Hashing (Payloads)
Limitations
• Generally slow solutions
• Re-use data structures, not using ad hoc codecs/file format
• Generally support only one vector per field
Apache Solr - Streaming Expressions
Index Time
<dynamicField name="*_fs" type="pfloats" indexed="true" stored="true"/>
Sample Docs:

curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/
update?commit=true --data-binary '
[
{"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
{"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
…
]
https://www.elastic.co/blog/lucene-points-6.0
org.apache.solr.schema.PointField
Multi Valued Field
<fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
Apache Solr - Streaming Expressions
Streaming Expression:

sort(
select(
search(food_collection,
q="*:*",
fl="id,vector_fs",
sort="id asc",
rows=3),
cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as sim,
id),
by="sim desc")
 
Response:
{
  "result-set": {
    "docs": [
        { "sim": 0.99996111, "id": "1" },
        { "sim": 0.98590279, "id": "10" },
        { "sim": 0.55566643, "id": "2" },
        { "EOF": true, "RESPONSE_TIME": 10 }
    ]
  }
} https://lucene.apache.org/solr/guide/8_7/vector-math.html
Drawbacks: 

1) it doesn’t apply to normal search
-> you need to use Streaming
Expressions

2) Requires traversing all vectors
and scoring them.

3) no support for multiple vectors
per field - SOLR-11077

Query Time
Apache Solr - Solr Vector Search Plugin
<fieldType name="VectorField" class="solr.TextField" indexed="true" termOffsets="true"
stored="true" termPayloads="true" termPositions="true" termVectors="true"
storeOffsetsWithPositions="true">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
</analyzer>
</fieldType>
<field name="vector" type="VectorField" indexed="true" termOffsets="true" stored="true"
termPositions="true" termVectors="true" multiValued="true"/>
Index Time
curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-name}/update?
commit=true --data-binary '
[
{"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "},
{"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "},
…
]'
Apache Solr - Solr Vector Search Plugin
Query Time
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="0.1,4.75,0.3,1.2,0.7,4.0" cosine=false}
N.B. Adding the parameter cosine=false calculates the dot product
"response":{"numFound":6,"start":0,"maxScore":40.1675,"docs":[
{
"name":["example 3"],
"vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "],
"score":40.1675},
{
"name":["example 0"],
"vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "],
"score":30.180502},
…
]}
Drawbacks: 

1) Payloads used for storing
vectors->

slow

2) Requires traversing all vectors
and scoring them.

3) support for multiple vectors per
field must be investigated

N.B. https://github.com/DmitryKey/solr-vector-scoring is a fork with a 8.6 Apache Solr port
Apache Solr - LSH Hashing Plugin
<fieldType name="VectorField" class="solr.BinaryField" stored="true" indexed="false" multiValued="false"/>
<field name="_vector_" type="VectorField" />
<field name="_lsh_hash_" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="vector" type="string" indexed="true" stored="true"/>
Index Time
<updateRequestProcessorChain name="LSH">
<processor class="com.github.saaay71.solr.updateprocessor.LSHUpdateProcessorFactory" >
<int name="seed">5</int>
<int name="buckets">50</int>
<int name="stages">50</int>
<int name="dimensions">6</int>
<str name="field">vector</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-
name}/update?update.chain=LSH&commit=true --data-binary '
[{"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"},
{"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"}]'
Apache Solr - LSH Hashing Plugin
Query Time
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true" reRankDocs="5"}
&fl=name,score,vector,_vector_,_lsh_hash_
"response":{"numFound":1,"start":0,"maxScore":36.65736,"docs":[
{
"id": "1",
"vector":"1.55,3.53,2.3,0.7,3.44,2.33",
"_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==",
"_lsh_hash_":["0_8",
"1_35",
"2_7",
…
"49_43"],
"score":36.65736}
]
Drawbacks: 

1) Performance must be
investigated, usage of binary fields
with encoded vectors

2) latest commit October 2018

Elasticsearch
Status of Deep Learning Vector Based Search
• Lucene latest codecs and file format not used yet
https://github.com/elastic/elasticsearch/issues/42326 - Work in progress for covering Approximate Nearest Neighbour Techiques

Ready to use Approaches
• X-Pack enterprise features - https://www.elastic.co/guide/en/elasticsearch/reference/current/dens
vector.html
• https://github.com/alexklibisz/elastiknn
• https://github.com/opendistro-for-elasticsearch/k-NN
Limitations
• Performance must be investigated ( https://elastiknn.com/performance/ )
• Re-use data structures, not using ad hoc codecs/file format
• Supports only one vector per field
Elasticsearch - X-Pack
Index Time
PUT my-index-000001
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3
},
“status" : {
"type" : "keyword"
}
}
}
}
PUT my-index-000001/_doc/1
{
"my_dense_vector": [0.5, 10, 6],
"status" : "published"
}
PUT my-index-000001/_doc/2
{
"my_dense_vector": [-0.5, 10, 10],
"status" : "published"
}
• N.B. Lucene latest codecs and file format not used yet, vectors are stored as binary doc values.
Elasticsearch - X-Pack
Query Time
N.B. various distance functions are supported
Drawbacks: 

1) Requires traversing all vectors
returned by initial query and scoring
them.

2) no support for multiple vectors
per field



GET my-index-000001/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"term" : {
"status" : "published"
}
}
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
"params": {
"query_vector": [4, 3.4, -0.2]
}
}
}
}
}
Next Steps
● Keep an eye on our Blog: https://sease.io/blog, as more is coming!
Question 2
Learning to Rank Libraries
RankLib https://github.com/codelibs/ranklib
XGBoost (University of Washington) https://github.com/dmlc/xgboost
TensorFlow (Google) https://github.com/tensorflow/ranking
LigthGBM (Microsoft) https://github.com/Microsoft/LightGBM
CatBoost (Yandex) https://github.com/catboost
SVMRank http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html
LightFM https://github.com/lyst/lightfm
QuickRank (ISTI-CNR) https://github.com/hpclab/quickrank
JForests https://github.com/yasserg/jforests
Ranklib
Overview
https://sourceforge.net/p/lemur/wiki/RankLib/
RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been
implemented:
• MART (Multiple Additive Regression Trees, a.k.a. Gradient boosted regression tree) [6]
• RankNet [1]
• RankBoost [2]
• AdaRank [3]
• Coordinate Ascent [4]
• LambdaMART [5]
• ListNet [7]
• Random Forests [8]
Ranklib
Our Experience
https://sourceforge.net/p/lemur/wiki/RankLib/
• Multiple learning to rank libraries supported including LambdaMART
• Relatively easy to use
• Command Line Interface application -> not meant to be integrated with other apps
• Java code, minimal Test Coverage
• Svn (there’s a Github port, not official: https://github.com/codelibs/ranklib )
• Small Community
XGBoost
Overview
https://github.com/dmlc/xgboost
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient,
flexible and portable.
• It implements machine learning algorithms under the Gradient Boosting framework.
• XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data
science problems in a fast and accurate way.
• The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problem
beyond billions of examples.
XGBoost
Our Experience
• Multiple learning to rank libraries supported including LambdaMART
• Relatively easy to use
• Library easy to integrate
• Python code, huge project, Tests seem fair
• Github (https://github.com/dmlc/xgboost )
• Extremely popular
• Huge Community
Learning to Rank Libraries
Limitations:
‣ Developed for small data sets
‣ Limited support for Sparse Features
‣ Require extensive Feature Engineering
‣ Do not support the recent advances in Unbiased Learning-to-rank
The TensorFlow Ranking library addresses these gaps
TensorFlow Ranking
Overview
‣ Open source library for solving large-scale ranking problems in a deep learning framework
‣ Developed by Google’s AI department
‣ Fast and easy to use
‣ Flexible and highly configurable
‣ Support Pointwise, Pairwise, and Listwise losses
‣ Support popular ranking metrics like Mean Reciprocal Rank (MRR) and Normalized
Discounted Cumulative Gain (NDCG)
GitHub: https://github.com/tensorflow/ranking
TensorFlow Ranking
Additional components:
‣ Fully integrated with the rest of the TensorFlow ecosystem
‣ Can handle textual features using Text Embeddings
‣ Multi-item (also known as Groupwise) scoring functions
‣ LambdaLoss implementation
‣ Unbiased Learning-to-Rank
TF-Ranking Article: https://arxiv.org/abs/1812.00073
XGBoost vs TensorFlow
XGBoost TensorFlow
Tree-based Ranker Neural Ranker
Handle Missing Values Handle Missing Values
Run Efficiently on CPU Run Efficiently on CPU
Large Scale Training Large Scale Training
Main Differences

More Related Content

What's hot

Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Sease
 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackSease
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to RankSease
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document ClassificationAlessandro Benedetti
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAlessandro Benedetti
 
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Andrea Gazzarini
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectSease
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Sease
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Feature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text CollectionsFeature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text CollectionsSease
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrTrey Grainger
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Lucidworks
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 

What's hot (17)

Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - Haystack
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank Project
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Feature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text CollectionsFeature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text Collections
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 

Similar to Interactive Questions and Answers - London Information Retrieval Meetup

Graphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationGraphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationZbyszko Papierski
 
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldXebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldMichaël Figuière
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Staying Sane with Drupal NEPHP
Staying Sane with Drupal NEPHPStaying Sane with Drupal NEPHP
Staying Sane with Drupal NEPHPOscar Merida
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
pull requests I sent to scala/scala (ny-scala 2019)
pull requests I sent to scala/scala (ny-scala 2019)pull requests I sent to scala/scala (ny-scala 2019)
pull requests I sent to scala/scala (ny-scala 2019)Eugene Yokota
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solrlucenerevolution
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
Backbonejs for beginners
Backbonejs for beginnersBackbonejs for beginners
Backbonejs for beginnersDivakar Gu
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Scala Frustrations
Scala FrustrationsScala Frustrations
Scala Frustrationstakezoe
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineDataWorks Summit
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaoladrewz lin
 
Rails and alternative ORMs
Rails and alternative ORMsRails and alternative ORMs
Rails and alternative ORMsJonathan Dahl
 

Similar to Interactive Questions and Answers - London Information Retrieval Meetup (20)

Graphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation explorationGraphs, Graphs everywhere - Lucene powered relation exploration
Graphs, Graphs everywhere - Lucene powered relation exploration
 
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldXebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Staying Sane with Drupal NEPHP
Staying Sane with Drupal NEPHPStaying Sane with Drupal NEPHP
Staying Sane with Drupal NEPHP
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
pull requests I sent to scala/scala (ny-scala 2019)
pull requests I sent to scala/scala (ny-scala 2019)pull requests I sent to scala/scala (ny-scala 2019)
pull requests I sent to scala/scala (ny-scala 2019)
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solr
 
NLP Project Full Circle
NLP Project Full CircleNLP Project Full Circle
NLP Project Full Circle
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Backbonejs for beginners
Backbonejs for beginnersBackbonejs for beginners
Backbonejs for beginners
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Scala Frustrations
Scala FrustrationsScala Frustrations
Scala Frustrations
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
 
Rails and alternative ORMs
Rails and alternative ORMsRails and alternative ORMs
Rails and alternative ORMs
 

More from Sease

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors LuceneSease
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...Sease
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaSease
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneSease
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Sease
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveSease
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaSease
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache SolrSease
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale IndexingSease
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfSease
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxSease
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingSease
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneSease
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSease
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information RetrievalSease
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
 

More from Sease (20)

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspective
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale Indexing
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptx
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Interactive Questions and Answers - London Information Retrieval Meetup

  • 3. General Considerations Language Models https://en.wikipedia.org/wiki/BERT_(language_model) https://en.wikipedia.org/wiki/GPT-3 https://rajpurkar.github.io/SQuAD-explorer/ • Pre Trained on large corpora (expensive) • Ad hoc fine tuning to solve Natural Language Tasks (inexpensive) • Ability to encode terms and sentences as high dimensional vectors e.g. https://github.com/google-research/bert#pre-trained-models https://github.com/hanxiao/bert-as-service/ Bert vectors for sentences [‘access the bank', ‘walking by the street', ‘tigers are big cats'] : [[ 0.13186474 0.32404128 -0.82704437 ... -0.3711958 -0.39250174 -0.31721866] [ 0.24873531 -0.12334424 -0.38933852 ... -0.44756213 -0.5591355 -0.11345179] [ 0.28627345 -0.18580122 -0.30906814 ... -0.2959366 -0.39310536 0.07640187]]
  • 4. General Considerations Language Models in Search • Indexing Time : encode sentences (or full field contents) and store the vectors • Searching Time: encode the query • Score the query-document vectors pair, calculating vector distance/similarity: Euclidean distance Cosine Similarity … Limitations • Rank entire corpus of documents ? Apply an (Approximate) Nearest Neighbour approach? • Performance for embedding extraction? • Un-intuitive results -> should be combined with Traditional Information Retrieval • Explainability
  • 5. Apache Lucene Ideally you want to avoid scoring all documents of your corpus for your query. The algorithms for vector retrieval can be roughly classified into four categories, 1. Tree-base algorithms, such as KD-tree; 2. Hashing methods, such as LSH (Local Sensitive Hashing); 3. Product quantization based algorithms, such as IVFFlat; 4. Graph-base algorithms, such as HNSW, SSG, NSG; Specific File Format (Nov 2020) •https://issues.apache.org/jira/browse/LUCENE-9004 Hierarchical Navigable Small World Graphs - DONE •https://issues.apache.org/jira/browse/LUCENE-9322 DONE Unified Vector Format •https://issues.apache.org/jira/browse/LUCENE-9136 IVFFlat - In Progress
  • 6. Apache Lucene Follow-ups - reducing heap usage during graph construction - adding a Query implementation - exposing index hyper-parameters - benchmarks - testing on public datasets - implementing a diversity heuristic for neighbour selection during graph construction - making the graph hierarchical - exploring more efficient search across multiple per-segment graphs… Keep an eye on Lucene JIRA! https://issues.apache.org/jira/browse/LUCENE-9004
  • 7. Apache Solr Status of Deep Learning Vector Based Search • Lucene latest codecs and file format not used yet https://issues.apache.org/jira/browse/SOLR-14397 -> develop an official solution out of the box https://issues.apache.org/jira/browse/SOLR-12890 -> summary Ready to use Approaches • Vector Scoring using Streaming Expressions (Point Fields) • Available Solr Vector Search Plugin - https://github.com/saaay71/solr-vector-scoring (Payloads) https://medium.com/@dmitry.kan/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559 • Available Solr Vector Search Plugin with LSH Hashing (Payloads) Limitations • Generally slow solutions • Re-use data structures, not using ad hoc codecs/file format • Generally support only one vector per field
  • 8. Apache Solr - Streaming Expressions Index Time <dynamicField name="*_fs" type="pfloats" indexed="true" stored="true"/> Sample Docs: curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/ update?commit=true --data-binary ' [ {"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}, {"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]}, … ] https://www.elastic.co/blog/lucene-points-6.0 org.apache.solr.schema.PointField Multi Valued Field <fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
  • 9. Apache Solr - Streaming Expressions Streaming Expression: sort( select( search(food_collection, q="*:*", fl="id,vector_fs", sort="id asc", rows=3), cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as sim, id), by="sim desc")   Response: {   "result-set": {     "docs": [         { "sim": 0.99996111, "id": "1" },         { "sim": 0.98590279, "id": "10" },         { "sim": 0.55566643, "id": "2" },         { "EOF": true, "RESPONSE_TIME": 10 }     ]   } } https://lucene.apache.org/solr/guide/8_7/vector-math.html Drawbacks: 1) it doesn’t apply to normal search -> you need to use Streaming Expressions 2) Requires traversing all vectors and scoring them. 3) no support for multiple vectors per field - SOLR-11077
 Query Time
  • 10. Apache Solr - Solr Vector Search Plugin <fieldType name="VectorField" class="solr.TextField" indexed="true" termOffsets="true" stored="true" termPayloads="true" termPositions="true" termVectors="true" storeOffsetsWithPositions="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/> </analyzer> </fieldType> <field name="vector" type="VectorField" indexed="true" termOffsets="true" stored="true" termPositions="true" termVectors="true" multiValued="true"/> Index Time curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-name}/update? commit=true --data-binary ' [ {"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "}, {"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "}, … ]'
  • 11. Apache Solr - Solr Vector Search Plugin Query Time http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0" cosine=false} N.B. Adding the parameter cosine=false calculates the dot product "response":{"numFound":6,"start":0,"maxScore":40.1675,"docs":[ { "name":["example 3"], "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "], "score":40.1675}, { "name":["example 0"], "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "], "score":30.180502}, … ]} Drawbacks: 1) Payloads used for storing vectors->
 slow 2) Requires traversing all vectors and scoring them. 3) support for multiple vectors per field must be investigated
 N.B. https://github.com/DmitryKey/solr-vector-scoring is a fork with a 8.6 Apache Solr port
  • 12. Apache Solr - LSH Hashing Plugin <fieldType name="VectorField" class="solr.BinaryField" stored="true" indexed="false" multiValued="false"/> <field name="_vector_" type="VectorField" /> <field name="_lsh_hash_" type="string" indexed="true" stored="true" multiValued="true"/> <field name="vector" type="string" indexed="true" stored="true"/> Index Time <updateRequestProcessorChain name="LSH"> <processor class="com.github.saaay71.solr.updateprocessor.LSHUpdateProcessorFactory" > <int name="seed">5</int> <int name="buckets">50</int> <int name="stages">50</int> <int name="dimensions">6</int> <str name="field">vector</str> </processor> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection- name}/update?update.chain=LSH&commit=true --data-binary ' [{"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"}, {"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"}]'
  • 13. Apache Solr - LSH Hashing Plugin Query Time http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true" reRankDocs="5"} &fl=name,score,vector,_vector_,_lsh_hash_ "response":{"numFound":1,"start":0,"maxScore":36.65736,"docs":[ { "id": "1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33", "_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==", "_lsh_hash_":["0_8", "1_35", "2_7", … "49_43"], "score":36.65736} ] Drawbacks: 1) Performance must be investigated, usage of binary fields with encoded vectors
 2) latest commit October 2018

  • 14. Elasticsearch Status of Deep Learning Vector Based Search • Lucene latest codecs and file format not used yet https://github.com/elastic/elasticsearch/issues/42326 - Work in progress for covering Approximate Nearest Neighbour Techiques Ready to use Approaches • X-Pack enterprise features - https://www.elastic.co/guide/en/elasticsearch/reference/current/dens vector.html • https://github.com/alexklibisz/elastiknn • https://github.com/opendistro-for-elasticsearch/k-NN Limitations • Performance must be investigated ( https://elastiknn.com/performance/ ) • Re-use data structures, not using ad hoc codecs/file format • Supports only one vector per field
  • 15. Elasticsearch - X-Pack Index Time PUT my-index-000001 { "mappings": { "properties": { "my_vector": { "type": "dense_vector", "dims": 3 }, “status" : { "type" : "keyword" } } } } PUT my-index-000001/_doc/1 { "my_dense_vector": [0.5, 10, 6], "status" : "published" } PUT my-index-000001/_doc/2 { "my_dense_vector": [-0.5, 10, 10], "status" : "published" } • N.B. Lucene latest codecs and file format not used yet, vectors are stored as binary doc values.
  • 16. Elasticsearch - X-Pack Query Time N.B. various distance functions are supported Drawbacks: 1) Requires traversing all vectors returned by initial query and scoring them. 2) no support for multiple vectors per field
 
 GET my-index-000001/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", "params": { "query_vector": [4, 3.4, -0.2] } } } } }
  • 17. Next Steps ● Keep an eye on our Blog: https://sease.io/blog, as more is coming!
  • 19. Learning to Rank Libraries RankLib https://github.com/codelibs/ranklib XGBoost (University of Washington) https://github.com/dmlc/xgboost TensorFlow (Google) https://github.com/tensorflow/ranking LigthGBM (Microsoft) https://github.com/Microsoft/LightGBM CatBoost (Yandex) https://github.com/catboost SVMRank http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html LightFM https://github.com/lyst/lightfm QuickRank (ISTI-CNR) https://github.com/hpclab/quickrank JForests https://github.com/yasserg/jforests
  • 20. Ranklib Overview https://sourceforge.net/p/lemur/wiki/RankLib/ RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented: • MART (Multiple Additive Regression Trees, a.k.a. Gradient boosted regression tree) [6] • RankNet [1] • RankBoost [2] • AdaRank [3] • Coordinate Ascent [4] • LambdaMART [5] • ListNet [7] • Random Forests [8]
  • 21. Ranklib Our Experience https://sourceforge.net/p/lemur/wiki/RankLib/ • Multiple learning to rank libraries supported including LambdaMART • Relatively easy to use • Command Line Interface application -> not meant to be integrated with other apps • Java code, minimal Test Coverage • Svn (there’s a Github port, not official: https://github.com/codelibs/ranklib ) • Small Community
  • 22. XGBoost Overview https://github.com/dmlc/xgboost XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. • It implements machine learning algorithms under the Gradient Boosting framework. • XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. • The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problem beyond billions of examples.
  • 23. XGBoost Our Experience • Multiple learning to rank libraries supported including LambdaMART • Relatively easy to use • Library easy to integrate • Python code, huge project, Tests seem fair • Github (https://github.com/dmlc/xgboost ) • Extremely popular • Huge Community
  • 24. Learning to Rank Libraries Limitations: ‣ Developed for small data sets ‣ Limited support for Sparse Features ‣ Require extensive Feature Engineering ‣ Do not support the recent advances in Unbiased Learning-to-rank The TensorFlow Ranking library addresses these gaps
  • 25. TensorFlow Ranking Overview ‣ Open source library for solving large-scale ranking problems in a deep learning framework ‣ Developed by Google’s AI department ‣ Fast and easy to use ‣ Flexible and highly configurable ‣ Support Pointwise, Pairwise, and Listwise losses ‣ Support popular ranking metrics like Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG) GitHub: https://github.com/tensorflow/ranking
  • 26. TensorFlow Ranking Additional components: ‣ Fully integrated with the rest of the TensorFlow ecosystem ‣ Can handle textual features using Text Embeddings ‣ Multi-item (also known as Groupwise) scoring functions ‣ LambdaLoss implementation ‣ Unbiased Learning-to-Rank TF-Ranking Article: https://arxiv.org/abs/1812.00073
  • 27. XGBoost vs TensorFlow XGBoost TensorFlow Tree-based Ranker Neural Ranker Handle Missing Values Handle Missing Values Run Efficiently on CPU Run Efficiently on CPU Large Scale Training Large Scale Training Main Differences