Title:
Semantic Equivalence of e-Commerce Queries
Authors:
Aritra Mandal, Daniel Tunkelang, Zhe Wu
Presented at KDD 2023 Workshop on E-Commerce and Natural Language Processing (ECNLP 2023).
2. Search query != search intent.
● Information retrieval researchers worry about queries that map to multiple intents.
jaguar or ?
● Practitioners worry more about multiple queries that map to the same intent.
lightning to 3.5mm
iphone to aux
4. Opportunity to increase recall while preserving precision.
Similar but not
equivalent intent.
5. High-level strategy to leverage query equivalence.
Map queries to vectors.
Store in nearest-neighbor database.
(i.e., optimize for user
or business outcome)
6. Two strategies for recognizing equivalent queries.
● Surface Similarity
○ Variation in inflection, word order, compounding, noise words.
black tshirts for men = mens black t-shirt =
● Behavioral Similarity
○ Queries lead to engagement with equivalent or similar results.
lightning to 3.5mm = iphone to aux =
8. Query vectors are centroids of associated product vectors
►
►
[0.13, 0.81, … ]
[0.09, 0.75, … ]
…
►
[0.11, 0.79, … ]
[0.13, 0.81, … ]
[0.09, 0.77, … ]
…
►
[0.12, 0.78, … ]
►
cos > 0.98
black tshirts for men mens black t-shirt
9. Works well, but only for head and torso queries.
● Offline approach works for queries with enough engagement history.
● Would be expensive to compute aggregates of result vectors online.
● Still, head and torso queries tend to represent a large fraction of traffic.
10. Train online sentence transformer model for tail queries.
● Train using (query1, query2, similarity) triples from offline model.
● Oversample similar query pairs to increase sensitivity where it matters.
● Fine-tune a pre-trained micro-BERT sentence transformer model.
● Concatenate the output of a query classifier to the query keywords.
12. Results
Model Dataset Name Pearson’s correlation
query-sim-ecom eBay Internal 0.87
query-sim-ecom ESCI query-query 0.85
all-MiniLM-L12-v2 ESCI query-query 0.68
Query 1 Query 2 cosine
hdmi to galaxy s8 s9 hdmi 0.9993
movie money prop money 0.9995
cassette adapter for iphone tape to aux 0.9993
Examples from ESCI
of queries with low
surface but high
behavioral similarity:
13. Summary
● Queries with equivalent intent should yield equivalent experiences.
● Query similarity can increase recall while preserving precision.
● Signals can come from either surface or behavioral similarity.
● Offline bag-of-documents model: queries as means of product vectors.
● Fine-tune online Micro-BERT sentence transformer model for tail queries.
● It just works!