SlideShare a Scribd company logo
1 of 23
Download to read offline
Confidential - donot distribute
Hotels.com’sjourneyto becoming
anAlgorithmicBusiness
Matthew Fryer
VP, Chief Data Science Officer
mfryer@hotels.com
Confidential - donot distribute
Part of Expedia, Inc. family
385,000 properties
89 countries
39 languages
>27m Hotels.com Rewards Members
Home of Captain Obvious
Billions of Recommendations, based on real-time Data per day
Hotels.com
Confidential - donot distribute
Confidential - donot distribute
Confidential - donot distribute
5
Data Science Engineering Front End Development
Confidential - donot distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior Executive,
Expedia, Inc.
3M’s are disruptive
technology
Mobile
Messaging / NLP
Machine Learning
Confidential - donot distribute
Confidential - donot distribute
Our overall ecosystem
Confidential - donot distribute 9
Core Elementsof our Data ScienceCloud Platform
Databricks Unified Platform
Maestro – Our Internally Developed
Platform on AWS
(EMR, Spark, R-Studio, Intellij, SBT, Jupyter,
Zeppelin, Unit / QA, Metastore, Apache Airflow,
Keras, Tensorflow)
Proof of Concept on Google
Cloud, Beam, Spark &
Tensorflow
Confidential - donot distribute
DatabricksUnifiedPlatform
Chart is in1hourblocks, y axis = numberof 32coreinstances
10
• Key asset to the success of data science at Hotels.com
• Key in driving up data scientist productivity / efficiency / flexibility
• Helps make our data science lifecycle operate much easier and
faster driving speed to market
• Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting
cost effective spot instance on AWS.
Confidential - donot distribute
ALPs – AlgorithmLifecyclePipelineService
11
Confidential - donot distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Imagesarean importantfactorwhilechoosinga hotel
12
0% 10% 20% 30% 40% 50% 60% 70% 80%
Loyalty Program
Reviews
Hotel Brand
Star Rating
Destination Info
Images
Hotel Info
Factors other than price/location
Very Imporant/Important Important Very Important
Confidential - donot distribute
ComputerVisionproblemswetry to tackle
13
Near Duplicate Detection
Scene Classification Image Ranking
Confidential - donot distribute 14
Tagged as Bathroom
Confidential - donot distribute 15
GPU’s quickly became key, took a large effort to optimize using
Keras + Tensorflow (Inception v3 + ResNet)
493
67
20
7
4
1
10
100
1000
12-CPU 1-GPU 1-GPU +
limited cache
16-GPU +
limited cache
16-GPU + full
cache
Days CIFAR2
Expedia Small
15
2.5
0
10
20
16-GPU + full cache Optimized
Days
Confidential - donot distribute
NearDuplicateDetection:Realworldexamples
16
Non-Duplicates – probability 100%
Non-Duplicates – probability 95.91%
Duplicates – probability 97.98%
Duplicates – probability 98.43%
Confidential - donot distribute
ROOM/BATHROOM
Usingthe model:Real worldexamples
17
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
ROOM/LIVING_ROOM
ROOM/GUESTROOM
FACILITIES/DINING
INTERIOR/SEATING_LOBBY
FACILITIES/POOL
Confidential - donot distribute
Accuracy& ConfusionMatrix
18
• After many manual / long
winded iterations and
regularization processes
tuning hyperparameters
• We achieved good
accuracy and low
confusion matrix
Confidential - donot distribute
Optimizingthe photo orderfor improvedcustomer
experiences
19
Original Model
Reference: Radisson Blu Edwardian Berkshire Hotel, London
Confidential - donot distribute
Findingthe right hotel in our marketplace is core to
our customers needs.
Confidential - donot distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Chelsea
Battersea
Wimbledon
Wembley
City of
London
As an exampledifferentusersegmentsliketo stayin
differentlocations
Confidential - donot distribute 22
Utility
Utility
Utility
just browsing! BOOK!Intent
(click)
Confidential - donot distribute
Thank you
mfryer@hotels.com
https://uk.linkedin.com/in/matthewfryer
@mattfryer

More Related Content

What's hot

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Confidential Computing overview
Confidential Computing overviewConfidential Computing overview
Confidential Computing overviewMark Argent
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Visual_BI
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
Azure Arc by K.Narisorn // Azure Multi-Cloud
Azure Arc by K.Narisorn // Azure Multi-CloudAzure Arc by K.Narisorn // Azure Multi-Cloud
Azure Arc by K.Narisorn // Azure Multi-CloudKumton Suttiraksiri
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
 
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts VMworld
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GooglePatrick Pierson
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data ScienceErik Bernhardsson
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDatabricks
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersSplunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersHarry McLaren
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 

What's hot (20)

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Confidential Computing overview
Confidential Computing overviewConfidential Computing overview
Confidential Computing overview
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
The Benefits of Cloud Computing
The Benefits of Cloud ComputingThe Benefits of Cloud Computing
The Benefits of Cloud Computing
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Azure Arc by K.Narisorn // Azure Multi-Cloud
Azure Arc by K.Narisorn // Azure Multi-CloudAzure Arc by K.Narisorn // Azure Multi-Cloud
Azure Arc by K.Narisorn // Azure Multi-Cloud
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
 
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
VMworld 2013: Capacity Jail Break: vSphere 5 Space Reclamation Nuts and Bolts
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark Applications
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersSplunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy Forwarders
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 

Similar to Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer

Protecting Data Everywhere - Barracuda
Protecting Data Everywhere - BarracudaProtecting Data Everywhere - Barracuda
Protecting Data Everywhere - BarracudaMarcoTechnologies
 
Machine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
Machine Learning & Cyber Security: Detecting Malicious URLs in the HaystackMachine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
Machine Learning & Cyber Security: Detecting Malicious URLs in the HaystackAlistair Gillespie
 
Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA UpdatesSecuring the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA UpdatesMahbubul Alam
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!Adrian Hornsby
 
Cloud Computing in 3-D
Cloud Computing in 3-DCloud Computing in 3-D
Cloud Computing in 3-DEverest Group
 
New recipes for the ever growing content cloud
New recipes for the ever growing content cloudNew recipes for the ever growing content cloud
New recipes for the ever growing content cloudCédric Hüsler
 
Keynote fx try harder 2 be yourself
Keynote fx   try harder 2 be yourselfKeynote fx   try harder 2 be yourself
Keynote fx try harder 2 be yourselfDefconRussia
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Keith Kraus
 
Delivering application happiness for you!
Delivering application happiness for you!Delivering application happiness for you!
Delivering application happiness for you!Anna Zabolotnaya
 
Gartner: Top 10 Technology Trends 2015
Gartner: Top 10 Technology Trends 2015Gartner: Top 10 Technology Trends 2015
Gartner: Top 10 Technology Trends 2015Den Reymer
 
Triangulum - Ransomware Evolved - Why your backups arent good enough
Triangulum - Ransomware Evolved - Why your backups arent good enoughTriangulum - Ransomware Evolved - Why your backups arent good enough
Triangulum - Ransomware Evolved - Why your backups arent good enoughMartin Opsahl
 
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
#w-cell-struc-security Wardley Maps: Cell Bases structures for SecurityOpen Security Summit
 
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...NetworkCollaborators
 
2012: The End of the World?
2012: The End of the World?2012: The End of the World?
2012: The End of the World?Saumil Shah
 
EMEA10: Trepidation in Moving to the Cloud
EMEA10: Trepidation in Moving to the CloudEMEA10: Trepidation in Moving to the Cloud
EMEA10: Trepidation in Moving to the CloudCompTIA UK
 
Turn OTA Lookers into Direct Bookers
Turn OTA Lookers into Direct BookersTurn OTA Lookers into Direct Bookers
Turn OTA Lookers into Direct BookersLeonardo
 
DVX: Data Visualization Experiences
DVX: Data Visualization ExperiencesDVX: Data Visualization Experiences
DVX: Data Visualization ExperiencesRandy Krum
 
Park Inn Business Development Brief
Park  Inn  Business  Development  BriefPark  Inn  Business  Development  Brief
Park Inn Business Development BriefUshouldsendit2
 
Park inn business development brief
Park inn business development briefPark inn business development brief
Park inn business development briefUshouldsendit2
 

Similar to Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer (20)

Protecting Data Everywhere - Barracuda
Protecting Data Everywhere - BarracudaProtecting Data Everywhere - Barracuda
Protecting Data Everywhere - Barracuda
 
Machine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
Machine Learning & Cyber Security: Detecting Malicious URLs in the HaystackMachine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
Machine Learning & Cyber Security: Detecting Malicious URLs in the Haystack
 
Skynet Week 8 H4D Stanford 2016
Skynet Week 8 H4D Stanford 2016Skynet Week 8 H4D Stanford 2016
Skynet Week 8 H4D Stanford 2016
 
Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA UpdatesSecuring the Software Defined Car™ Using Artificial Intelligence and OTA Updates
Securing the Software Defined Car™ Using Artificial Intelligence and OTA Updates
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!
 
Cloud Computing in 3-D
Cloud Computing in 3-DCloud Computing in 3-D
Cloud Computing in 3-D
 
New recipes for the ever growing content cloud
New recipes for the ever growing content cloudNew recipes for the ever growing content cloud
New recipes for the ever growing content cloud
 
Keynote fx try harder 2 be yourself
Keynote fx   try harder 2 be yourselfKeynote fx   try harder 2 be yourself
Keynote fx try harder 2 be yourself
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
 
Delivering application happiness for you!
Delivering application happiness for you!Delivering application happiness for you!
Delivering application happiness for you!
 
Gartner: Top 10 Technology Trends 2015
Gartner: Top 10 Technology Trends 2015Gartner: Top 10 Technology Trends 2015
Gartner: Top 10 Technology Trends 2015
 
Triangulum - Ransomware Evolved - Why your backups arent good enough
Triangulum - Ransomware Evolved - Why your backups arent good enoughTriangulum - Ransomware Evolved - Why your backups arent good enough
Triangulum - Ransomware Evolved - Why your backups arent good enough
 
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
#w-cell-struc-security Wardley Maps: Cell Bases structures for Security
 
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
Cisco Connect 2018 Thailand - Cisco aci delivering intent for data center net...
 
2012: The End of the World?
2012: The End of the World?2012: The End of the World?
2012: The End of the World?
 
EMEA10: Trepidation in Moving to the Cloud
EMEA10: Trepidation in Moving to the CloudEMEA10: Trepidation in Moving to the Cloud
EMEA10: Trepidation in Moving to the Cloud
 
Turn OTA Lookers into Direct Bookers
Turn OTA Lookers into Direct BookersTurn OTA Lookers into Direct Bookers
Turn OTA Lookers into Direct Bookers
 
DVX: Data Visualization Experiences
DVX: Data Visualization ExperiencesDVX: Data Visualization Experiences
DVX: Data Visualization Experiences
 
Park Inn Business Development Brief
Park  Inn  Business  Development  BriefPark  Inn  Business  Development  Brief
Park Inn Business Development Brief
 
Park inn business development brief
Park inn business development briefPark inn business development brief
Park inn business development brief
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (20)

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time with Matt Fryer

  • 1. Confidential - donot distribute Hotels.com’sjourneyto becoming anAlgorithmicBusiness Matthew Fryer VP, Chief Data Science Officer mfryer@hotels.com
  • 2. Confidential - donot distribute Part of Expedia, Inc. family 385,000 properties 89 countries 39 languages >27m Hotels.com Rewards Members Home of Captain Obvious Billions of Recommendations, based on real-time Data per day Hotels.com
  • 3. Confidential - donot distribute
  • 4. Confidential - donot distribute
  • 5. Confidential - donot distribute 5 Data Science Engineering Front End Development
  • 6. Confidential - donot distribute “Artificial Intelligence Will Be Travel’s Next Big Thing” Barry Diller Chairman & Senior Executive, Expedia, Inc. 3M’s are disruptive technology Mobile Messaging / NLP Machine Learning
  • 7. Confidential - donot distribute
  • 8. Confidential - donot distribute Our overall ecosystem
  • 9. Confidential - donot distribute 9 Core Elementsof our Data ScienceCloud Platform Databricks Unified Platform Maestro – Our Internally Developed Platform on AWS (EMR, Spark, R-Studio, Intellij, SBT, Jupyter, Zeppelin, Unit / QA, Metastore, Apache Airflow, Keras, Tensorflow) Proof of Concept on Google Cloud, Beam, Spark & Tensorflow
  • 10. Confidential - donot distribute DatabricksUnifiedPlatform Chart is in1hourblocks, y axis = numberof 32coreinstances 10 • Key asset to the success of data science at Hotels.com • Key in driving up data scientist productivity / efficiency / flexibility • Helps make our data science lifecycle operate much easier and faster driving speed to market • Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting cost effective spot instance on AWS.
  • 11. Confidential - donot distribute ALPs – AlgorithmLifecyclePipelineService 11
  • 12. Confidential - donot distribute Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour Imagesarean importantfactorwhilechoosinga hotel 12 0% 10% 20% 30% 40% 50% 60% 70% 80% Loyalty Program Reviews Hotel Brand Star Rating Destination Info Images Hotel Info Factors other than price/location Very Imporant/Important Important Very Important
  • 13. Confidential - donot distribute ComputerVisionproblemswetry to tackle 13 Near Duplicate Detection Scene Classification Image Ranking
  • 14. Confidential - donot distribute 14 Tagged as Bathroom
  • 15. Confidential - donot distribute 15 GPU’s quickly became key, took a large effort to optimize using Keras + Tensorflow (Inception v3 + ResNet) 493 67 20 7 4 1 10 100 1000 12-CPU 1-GPU 1-GPU + limited cache 16-GPU + limited cache 16-GPU + full cache Days CIFAR2 Expedia Small 15 2.5 0 10 20 16-GPU + full cache Optimized Days
  • 16. Confidential - donot distribute NearDuplicateDetection:Realworldexamples 16 Non-Duplicates – probability 100% Non-Duplicates – probability 95.91% Duplicates – probability 97.98% Duplicates – probability 98.43%
  • 17. Confidential - donot distribute ROOM/BATHROOM Usingthe model:Real worldexamples 17 EXTERIOR/HOTEL INTERIOR/SEATING_LO BBY ROOM/LIVING_ROOM ROOM/GUESTROOM FACILITIES/DINING INTERIOR/SEATING_LOBBY FACILITIES/POOL
  • 18. Confidential - donot distribute Accuracy& ConfusionMatrix 18 • After many manual / long winded iterations and regularization processes tuning hyperparameters • We achieved good accuracy and low confusion matrix
  • 19. Confidential - donot distribute Optimizingthe photo orderfor improvedcustomer experiences 19 Original Model Reference: Radisson Blu Edwardian Berkshire Hotel, London
  • 20. Confidential - donot distribute Findingthe right hotel in our marketplace is core to our customers needs.
  • 21. Confidential - donot distribute Kensington Bloomsbury Heathrow Canary Wharf Paddington Westminster London City Airport Chelsea Battersea Wimbledon Wembley City of London As an exampledifferentusersegmentsliketo stayin differentlocations
  • 22. Confidential - donot distribute 22 Utility Utility Utility just browsing! BOOK!Intent (click)
  • 23. Confidential - donot distribute Thank you mfryer@hotels.com https://uk.linkedin.com/in/matthewfryer @mattfryer