This document discusses challenges faced in implementing Presto, an open source distributed SQL query engine, for targeted audience delivery at TiVo. It describes choosing appropriate instance types for Presto worker nodes based on memory needs. It also addresses scaling the Presto cluster elastically to handle query concurrency and maturity issues with the Presto software. The document provides insights on testing Presto using Docker containers and connecting to mocked tables.
4. TV networks, programmers,
and advertisers
What are my target
viewership segments?
Set-Top box data
Purchasing Behavior
Location-based Consumer Data
Targeted Audience Delivery
Program Metadata
5. TV networks, programmers,
and advertisers
What are my target
viewership segments?
Set-Top box data
Purchasing Behavior
Location-based Consumer Data
Targeted Audience Delivery
Program Metadata
brought to you (in part) by
7. Similar Products at TiVo
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
8. Similar Products at TiVo
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
transactional and customer-configurable data
semi-aggregated viewership data +
sets of households (e.g., “18-24 years old”, “owns minivan”)
9. New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
MySQL
MySQL
MySQL
Many new data marts
popping up in our tech stack
10. New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
more viewership data
OK,
storage is cheap
11. New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
more viewership data
storage is not cheap…
12. New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
storage is not cheap… Need finer
grain data!
13. New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
storage is not cheap… Need finer
grain data!
Can’t aggregate
as much
14. New Product, New Challenges…
ETL
Amazon
S3 Java services
on EC2
ETL
Amazon
Redshift
MySQL
(RDS)
static,
hard to scale
18. Experiment: join on two tables
• Small Joins: join small Redshift table with (filtered-down) large table on S3
• Join across ~1M rows
• Large Joins: join large Redshift table with (unfiltered) large table table on S3
• Join across ~10M rows
Compare to: both tables on Redshift
How Does it Scale?
19. Time
(sec)
Concurrent queries
Redshift Spectrum for “Simple" Queries
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15
Latency (sec) vs. # Concurrent Requests
1 day 1 day (Spectrum)
Spectrum faster when cluster loaded
and can pre-filter/pre-aggregate data
small joins
20. Time
(sec)
Concurrent queries
Redshift Spectrum for “Simple" Queries
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15
Latency (sec) vs. # Concurrent Requests
1 day 1 day (Spectrum)
Spectrum faster when cluster loaded
and can pre-filter/pre-aggregate data
small joins
Spectrum faster
23. Memory for broadcast join on the cluster is a non-parallelizable resource in the cluster
Amdahl’s Law in Effect
24. Memory for broadcast join on the cluster is a non-parallelizable resource in the cluster
Amdahl’s Law in Effect
“Operations that can't be pushed to the Redshift Spectrum
layer include [JOIN], DISTINCT and ORDER BY. …
When large amounts of data are returned from Amazon S3,
the processing is limited by your cluster's resources.”
https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-performance.html
25. Wait, what about Redshift Spectrum ?
Our queries won’t work well on Spectrum.
27. Our Choice:
• Storage/Compute Separation
• Easy to add and remove worker nodes
• Query many different data sources (inside our VPC)
without separate load
• Good performance for analytical queries.
Not so good for transactional and simple queries…
• Managed (e.g., Qubole, Starburst)
28. Coordinator
Worker Worker Worker
S3 / Hive
metastore
MySQL
Connector
Connector
SELECT SUM(v.seconds_viewed)
FROM hive.db.viewership v
JOIN mysql.db.audiences a ON a.hh_id = v.hh_id
WHERE audience_id = 42
mysql catalog à
hive catalog à
SELECT …
FROM db.audiences
WHERE audience_id = 42
DRAFT - TiVo Confidential 2018
How Presto Works
Data is streamed
back to the workers
30. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
All Queries Start Using
Memory From Here
31. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
All Queries Start Using
Memory From Here
Query
32. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
General Pool —> Switch to Reserved
Query
33. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
General Pool —> Switch to Reserved
Query
34. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
General Pool —> Switch to Reserved
Query
Only one query allowed!
35. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
Reserved Pool —> Fail
Query
36. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
Reserved Pool —> Fail
Query
But there’s available
memory??
37. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Needs more memory than in
Reserved Pool —> keep allocating
(resource overcommit)
Query
38. Presto Worker Memory
System Memory
reserved-system-memory =
0.4 * JVM Max Memory
Reserved Memory
max-memory-per-node
General Memory
(the rest)
Query
But now a single query can
hog the entire cluster!
44. • What if memory usage varies a lot between different queries?
• Use many inexpensive instances, or a few expensive instances?
• Compute optimized or memory optimized?
Working With Reserved Memory Pool
How do we achieve that?
Conceptually, reserved memory pool should be the “high water mark”
while most queries complete in the general pool.
45. • What if memory usage varies a lot between different queries?
• Use many inexpensive instances, or a few expensive instances?
• Compute optimized or memory optimized?
Working With Reserved Memory Pool
Conceptually, reserved memory pool should be the “high water mark”
while most queries complete in the general pool.
Solution: multiple clusters based on workload
Empiric testing found smaller cluster size was slightly faster
Solution: Cost/Benefit Analysis
How do we achieve that?
46. Choosing the Right Instance Type
r 4 . 4 x l a r g e
Instance
Class
Generation
Multiplier
For CPU and Mem
t 2 . 2 x l a r g e
c 5 . 16x l a r g e
47. Choosing the Right Instance Type
r 4 . 4 x l a r g e
Instance
Class
Generation
Multiplier
For CPU and Mem
t 2 . 2 x l a r g e
c 5 . 16x l a r g e
Over 100 to choose from!
54. More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
55. More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
10 Queries
When will queries complete
at current rate?
Not fast enough!
56. More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
10 Queries
When will queries complete
at current rate?
Qubole provisions more nodes up to a limit
(around 3 minutes)
Presto
Worker
Presto
Worker
57. More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Presto
Worker
Presto
Worker
Too fast!
58. More Concurrency? Add More Nodes
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Qubole decommissions more nodes up to a limit
61. Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Upscaling only works for new queries
Presto
Worker
Presto
Worker
100% CPU 100% CPUIdle Idle
Not so fast…
Not fast enough!
Maybe we should have sent this query
to a more powerful cluster?
Autoscaling is for concurrency
63. Query History
Presto UI is nice for watching queries as they’re happening, but not for historical auditing
64. Service administration portal tracks Qubole commands
(Presto queries) and links to the Qubole web site
View and download intermediate queries and results
Presto Query Auditing
65. • Official Presto JDBC driver does not support Prepared Statements
• Worker loss not handled gracefully
(if one task fails, all tasks fail — we take that risk with retry logic)
• No support for upper-case table names in MySQL (Issue 2863)
• TIMESTAMP behavior does not match SQL standard (Issue 7122)
• Naïve query optimizer (talk to Starburst!)
Specific Technical Presto Issues
66. • Official Presto JDBC driver does not support Prepared Statements
• Worker loss not handled gracefully
(if one task fails, all tasks fail — we take that risk with retry logic)
• No support for upper-case table names in MySQL (Issue 2863)
• TIMESTAMP behavior does not match SQL standard (Issue 7122)
• Naïve query optimizer (talk to Starburst!)
Moral: you may need to get creative with workarounds
Specific Technical Presto Issues
68. Presto Docker container
using memory connectors
Testing
Declarative syntax allows us to mock tables
in the Docker container
69. Presto Docker container
using memory connectors
Testing
Declarative syntax allows us to mock tables
in the Docker container
…so we can test our generated queries in isolation
using Behavior-Driven Development.
72. Providing one logical view of the data model across many databases is great!
Favorite for many other workloads beyond its initial scope for this reason.
Presto’s simplicity resulted in widespread adoption.
Biggest (Positive) Surprise
74. Provocative Ending
Presto feels like an API gateway, but for data.
Behavioral Services Data Applications
Interface (REST, WSDL, Thrift, etc.) :: Data Definition Language (DDL)
Requests (HTTP, SOAP, etc.) :: Data Manipulation Language (DML)
Service implementation language :: Database technology
Publishing an endpoint :: Exposing a table or view
Service handler :: CREATE VIEW, CREATE TRIGGER
Service endpoint configuration :: Catalog/connector configuration
75. Provocative Ending
Presto feels like an API gateway, but for data.
Behavioral Services Data Applications
Interface (REST, WSDL, Thrift, etc.) :: Data Definition Language (DDL)
Requests (HTTP, SOAP, etc.) :: Data Manipulation Language (DML)
Service implementation language :: Database technology
Publishing an endpoint :: Exposing a table or view
Service handler :: CREATE VIEW, CREATE TRIGGER
Service endpoint configuration :: Catalog/connector configuration
What other engineering advancements can we push through the lens from
microservices (behaviors) to databases (state)?