SlideShare a Scribd company logo
1 of 57
Download to read offline
Incremental View Maintenance
with Coral, DBT, and Iceberg
May 2023
Modern Data Lake Architectures
• Compute Engines
• Process large amounts of data
• Orchestrators
• Execute jobs on a schedule
• Or on data availability
• ETL tools
• To implement, test, and build data
workflows
• Tables
• Continuously updated
Modern Data Lake Growth Pains
• Large number of jobs
• E.g, SQL workloads
• Workload scanning/computing
data from scratch each time
• Becomes more of a problem as the
data grows in volume.
SELECT posts.post_id,
COUNT(likes.user_id) AS
total_likes
FROM posts
LEFT JOIN likes ON
posts.post_id =
likes.post_id
GROUP BY posts.post_id;
SELECT AVG(num_comments)
AS avg_comments_per_user
FROM (
SELECT users.user_id,
COUNT(comments.comment_id
) AS num_comments
FROM users
INNER JOIN comments ON
users.user_id =
comments.user_id
GROUP BY users.user_id
) AS user_comments;
SELECT COUNT(DISTINCT
likes.user_id) AS
num_users_liked_and_comme
nted
FROM likes
INNER JOIN comments ON
likes.post_id =
comments.post_id AND
likes.user_id =
comments.user_id; SELECT sender_id,
COUNT(*) AS
num_messages_sent
FROM messages
GROUP BY sender_id;
SELECT users.user_id,
COUNT(friendships.friend_
id) AS num_friends
FROM users
INNER JOIN friendships ON
users.user_id =
friendships.user_id
GROUP BY users.user_id
ORDER BY num_friends DESC
LIMIT 10;
What if we can maintain tables incrementally?
Update tables only with the changes!
• Lower compute cost
• Lower latency
• More update-to-date insights/models
• Improved UX
• Focus on writing the logic, not the
incremental mechanics
• Declare full DAG using just SQL
Incremental Compute Made Easy
With Coral, Iceberg, and DBT
• DBT
• For capturing
transformations
• Coral
• For incremental
maintenance logic
• Iceberg
• SnapshotAPIs and
Incrementalscan
DBT Overview
What is DBT?
• Open-source data transformation tool (ETL) that enables teams to quickly build
complex data pipelines
Image from getdbt.com
DBT Overview
DBT Native MaterializationProperties: Table
• Model rebuilt as table on each run
(using CREATE TABLE AS)
• Takes a long time to rebuild
my_dbt_model.sql
DBT Overview
DBT Native MaterializationProperties: Incremental
• Inserts or updates records in the
built table on a manual run when
the source table changes
• Requires extra wrappers and
configurations, where users must
specify how to filter rows
• Described as an “advanced
usage” of DBT
my_dbt_model.sql
DBT Overview
DBT Native MaterializationProperties: Incremental
• Inserts or updates records in the
built table when the source table
changes
• Requires extra wrappers and
configurations, where users must
specify how to filter rows
• Described as an “advanced
usage” of DBT
my_dbt_model.sql
Desired User Experience
New MaterializationMode: Incremental Maintenance
• Incremental maintenance
functionality with no extra code
necessary
• One simple configuration
change from `table`
materialization mode
my_dbt_model.sql
Incremental View Maintenance
Calculating Incremental Queries
Simple Join Example
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
id product_price
2 $6
inventory prices
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
Calculating Incremental Queries
Simple Join Example
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
id product_price
2 $6
inventory prices
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
product_name product_price
LinkedIn Premium $6
t1
Calculating Incremental Queries
Simple Join Example: Drop and Rebuild
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
id product_price
2 $6
1 $3
3 $40
inventory prices
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
product_name product_price
LinkedIn Premium $6
t1
Calculating Incremental Queries
Simple Join Example: Drop and Rebuild
inventory prices
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
id product_price
2 $6
1 $3
3 $40
Calculating Incremental Queries
Simple Join Example: Drop and Rebuild
inventory prices
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
t2
product_name product_price
LinkedIn Premium $6
LinkedIn Learning $3
LinkedIn Recruiter $40
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
id product_price
2 $6
1 $3
3 $40
Calculating Incremental Queries
Simple Join Example: Incremental Maintenance
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
product_name product_price
LinkedIn Premium $6
t1
inventory prices
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
id product_price
2 $6
Calculating Incremental Queries
Simple Join Example: Incremental Maintenance
SELECT product_name, product_price
FROM inventory JOIN prices_delta
ON inventory.id = prices_delta.id
product_name product_price
LinkedIn Premium $6
t1
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
id product_price
2 $6
1 $3
3 $40
inventory prices Δt𝛼
product_name product_price
LinkedIn Learning $3
Calculating Incremental Queries
Simple Join Example: Incremental Maintenance
SELECT product_name, product_price
FROM inventory_delta JOIN prices
ON inventory_delta.id = prices.id
product_name product_price
LinkedIn Premium $6
LinkedIn Learning $3
t1 + Δt𝛼
Δtβ
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
id product_price
2 $6
1 $3
3 $40
inventory prices
product_name product_price
Calculating Incremental Queries
Simple Join Example: Incremental Maintenance
product_name product_price
LinkedIn Premium $6
LinkedIn Learning $3
t1 + Δt𝛼 + Δtβ
Δt𝛄
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
id product_price
2 $6
1 $3
3 $40
inventory prices
product_name product_price
LinkedIn Recruiter $40
SELECT product_name, product_price
FROM inventory_delta JOIN prices_delta
ON inventory_delta.id = prices_delta.id
Calculating Incremental Queries
Simple Join Example: Incremental Maintenance
INSERT INTO t1
(SELECT product_name, product_price
FROM inventory JOIN prices_delta
ON inventory.id = prices_delta.id
UNION ALL
SELECT product_name, product_price
FROM inventory_delta JOIN prices
ON inventory_delta.id = prices.id
UNION ALL
SELECT product_name, product_price
FROM inventory_delta JOIN prices_delta
ON inventory_delta.id = prices_delta.id)
product_name product_price
LinkedIn Premium $6
LinkedIn Learning $3
LinkedIn Recruiter $40
t1 + Δt𝛼 + Δtβ + Δt𝛄
Incremental Query
Δt𝛼
Δtβ
Δt𝛄
Coral
Overview
What is Coral?
• Translation, analysis, and query rewrite engine
• Open source since 2020
WIP
Future Dialect Future Dialect
Coral IR
• Captures query semantics using standardized operators
• Based on Apache Calcite
• Two semantically equivalent representations:
❑ Coral IR – AST
o Captures query semantics at the syntax tree layer
o Extends Calcite's SqlNode representation
o Use cases: SQL translations
❑ Coral IR – Logical Plan
o Captures query semantics at the logical plan layer
o Extends Calcite's RelNode representation
o Use cases: Query optimization, query rewrites, dynamic data masking
Coral IR - AST
• Captures query semantics using standardized operators at syntax tree level
Image generatedby Coral-Visualization
Trino SQL:
SELECT *
FROM test.foo JOIN test.
bar ON a = c WHERE
array_element[1] = 1
AND strpos(a, 'foo') > 0
Spark SQL:
SELECT *
FROM test.foo JOIN test.
bar ON a = c WHERE b[0]
= 1 AND instr(a, 'foo')
> 0
Coral IR – Logical Plan
• Extends Apache Calcite’s Relational Algebra Expressions
• Captures query semantics using standardized operators at logical plan level
Image generatedby Coral-Visualization
Trino SQL:
SELECT *
FROM test.foo JOIN test.
bar ON a = c WHERE
array_element[1] = 1
AND strpos(a, 'foo') > 0
Spark SQL:
SELECT *
FROM test.foo JOIN test.
bar ON a = c WHERE b[0]
= 1 AND instr(a, 'foo')
> 0
Incremental Maintenance with
Coral
Coral IR Transformation
TransformationOverview
Input
Representation
Output
Representation
Coral-Incremental
TransformationOverview
Input SQL Incremental SQL
Coral-Incremental
TransformationOverview
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
SELECT product_name, product_price
FROM inventory JOIN prices_delta
ON inventory.id = prices_delta.id
UNION ALL
SELECT product_name, product_price
FROM inventory_delta JOIN prices
ON inventory_delta.id = prices.id
UNION ALL
SELECT product_name, product_price
FROM inventory_delta JOIN prices_delta
ON inventory_delta.id = prices_delta.id
Coral-Incremental
SQL to Coral IR
Input Query
SELECT product_name, product_price
FROM inventory JOIN prices
ON inventory.id = prices.id
Coral-Incremental
Coral Rewrite
Input Query Incremental Query
Coral-Incremental
Coral Rewrite
Input Query Incremental Query
Coral-Incremental
Coral Rewrite
Input Query Incremental Query
Coral-Incremental
Coral Rewrite
Input Query Incremental Query
Coral-Incremental
Coral IR to SQL
Incremental Query
SELECT product_name, product_price
FROM inventory JOIN prices_delta
ON inventory.id = prices_delta.id
UNION ALL
SELECT product_name, product_price
FROM inventory_delta JOIN prices
ON inventory_delta.id = prices.id
UNION ALL
SELECT product_name, product_price
FROM inventory_delta JOIN prices_delta
ON inventory_delta.id = prices_delta.id
Coral-Service
Overview
• Spring boot service that exposes REST APIs to allow interaction with
Coral, without coming from an engine
• /api/incremental/rewrite
• Endpoint that handles pre and post processing between query and
Coral IR representations
Coral-Service Endpoint
CLI Example
Coral-Service Endpoint
Post Request
Coral-Service Endpoint
CLI Example
Coral-Service Endpoint
Endpoint Response
Coral-Service Endpoint
Endpoint Response
Desired State
• End-to-end framework to materialize frequently invoked views and efficiently
update records upon changes in base relations
✔️ Efficient Updates
Compute and apply incremental changes,
ratherthan re-computing on each
invocation.
Low Friction Adoption
Provide an end-to-end framework for users
to seamlessly adopt incremental
maintenance functionality while making
few modifications to their existing systems.
DBT Integration
Coral-Dbt
User Perspective
• Users can utilize incremental
maintenance functionality with their
models out-of-the-box with the coral-
dbt package
my_dbt_model.sql (initial configuration)
Coral-Dbt
User Perspective
• Users can utilize incremental
maintenance functionality with their
models out-of-the-box with the coral-
dbt package
my_dbt_model.sql (with incremental maintenance)
Coral-Dbt
Inside the `incremental_maintenance` MaterializationMode
1. Makes a POST request to the Coral service endpoint /api/incremental/rewrite,
passing the input SQL
2. Generates Scala code for incremental maintenance logic
3. Executes the generated Spark Scala code
Coral-Dbt
Inside the `incremental_maintenance` MaterializationMode
1. Makes a POST request to the Coral service endpoint /api/incremental/rewrite,
passing the input SQL
2. Generates Spark Scala code for incremental maintenance logic
3. Executes the generated Spark Scala code
Coral-Dbt: Leveraging Iceberg
Useful Iceberg Properties
• High-performance format for large analytics tables
• Table metadata tracks schema, partitioning configs, and snapshots
• Enables time travel and incremental reads via Spark Scala → ingredients for
incremental maintenance
Coral-Dbt: Code Generation
Retrieving Snapshot Ids
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
inventory
tnow (end)
tnow – 1 (start)
> val start_snapshot_id =
grab_snapshot_id_from_previous_run()
> val end_snapshot_id =
grab_latest_snapshot_id()
• For each table in the query:
• Grab timestamps tnow (end_snapshot_id) and
tnow-1 (start_snapshot_id)
Coral-Dbt: Code Generation
Creating Temp Views
• For each table in the query:
• Create temporary views representing the
original table and the additions
inventory
inventory_delta
inventory
> val df = load("inventory")
> val inventory =
df.snapshotTo(start_snapshot_id)
.createTempView()
> val inventory_delta =
df.snapshotFrom(start_snapshot_id)
.snapshotTo(end_snapshot_id)
.createTempView()
id product_name
1 LinkedIn Learning
2 LinkedIn Premium
3 LinkedIn Recruiter
Coral-Dbt: Code Generation
Executing Incremental Query and Updating MaterializedTable
> val query_response = spark.sql(incremental_maintenance_sql)
> query_response.appendToTable("my_join_output")
product_name product_price
LinkedIn Premium $6
LinkedIn Learning $3
LinkedIn Recruiter $40
t2 = t1 + query_response
product_name product_price
LinkedIn Premium $6
t1
Desired State
• End-to-end framework to materialize frequently invoked views and efficiently
update records upon changes in base relations
✔️ Efficient Updates
Compute and apply incremental changes,
ratherthan re-computing on each
invocation.
✔️ Low FrictionAdoption
Provide an end-to-end framework for users
to seamlessly adopt incremental
maintenance functionalitywhile making
few modifications to theirexisting systems.
Next Steps
• Expand supported queries
• Aggregates, outer joins
• Support updates and deletes
• Build cost-based model to identify optimal incremental maintenance plans
References
• Coral: https://github.com/linkedin/coral
• Incremental Maintenance Materialization
Mode: https://github.com/linkedin/coral/tree/master/coral-dbt
• Incremental rewrite: https://github.com/linkedin/coral/tree/master/coral-incremental
Contributors
Thank you!

More Related Content

What's hot

Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​
Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​
Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​Walaa Eldin Moustafa
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Spark Summit
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 

What's hot (20)

Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​
Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​
Coral & Transport UDFs: Building Blocks of a Postmodern Data Warehouse​
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 

Similar to Incremental View Maintenance with Coral, DBT, and Iceberg

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development Open Party
 
Udf&views in sql...by thanveer melayi
Udf&views in sql...by thanveer melayiUdf&views in sql...by thanveer melayi
Udf&views in sql...by thanveer melayiMuhammed Thanveer M
 
Ruby on rails
Ruby on rails Ruby on rails
Ruby on rails Mohit Jain
 
Database Refactoring Sreeni Ananthakrishna 2006 Nov
Database Refactoring Sreeni Ananthakrishna 2006 NovDatabase Refactoring Sreeni Ananthakrishna 2006 Nov
Database Refactoring Sreeni Ananthakrishna 2006 Novmelbournepatterns
 
Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Skillwise Group
 
Tutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online DatabaseTutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online DatabaseDBrow Adm
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework managermaxonlinetr
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibDatabricks
 
AVB202 Intermediate Microsoft Access VBA
AVB202 Intermediate Microsoft Access VBAAVB202 Intermediate Microsoft Access VBA
AVB202 Intermediate Microsoft Access VBADan D'Urso
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
MSFT Dumaguete 061616 - Building High Performance Apps
MSFT Dumaguete 061616 - Building High Performance AppsMSFT Dumaguete 061616 - Building High Performance Apps
MSFT Dumaguete 061616 - Building High Performance AppsMarc Obaldo
 
takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01
takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01
takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01Sadeesh Jayakumaran ☁
 
Micro-ORM Introduction - Don't overcomplicate
Micro-ORM Introduction - Don't overcomplicateMicro-ORM Introduction - Don't overcomplicate
Micro-ORM Introduction - Don't overcomplicateKiev ALT.NET
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformanceZohar Elkayam
 

Similar to Incremental View Maintenance with Coral, DBT, and Iceberg (20)

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
Udf&views in sql...by thanveer melayi
Udf&views in sql...by thanveer melayiUdf&views in sql...by thanveer melayi
Udf&views in sql...by thanveer melayi
 
Ruby on rails
Ruby on rails Ruby on rails
Ruby on rails
 
Database Refactoring Sreeni Ananthakrishna 2006 Nov
Database Refactoring Sreeni Ananthakrishna 2006 NovDatabase Refactoring Sreeni Ananthakrishna 2006 Nov
Database Refactoring Sreeni Ananthakrishna 2006 Nov
 
Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1Advanced integration services on microsoft ssis 1
Advanced integration services on microsoft ssis 1
 
Tutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online DatabaseTutorial - Learn SQL with Live Online Database
Tutorial - Learn SQL with Live Online Database
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework manager
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
Advanced Index Tuning
Advanced Index TuningAdvanced Index Tuning
Advanced Index Tuning
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
AVB202 Intermediate Microsoft Access VBA
AVB202 Intermediate Microsoft Access VBAAVB202 Intermediate Microsoft Access VBA
AVB202 Intermediate Microsoft Access VBA
 
SQL Tunning
SQL TunningSQL Tunning
SQL Tunning
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
MSFT Dumaguete 061616 - Building High Performance Apps
MSFT Dumaguete 061616 - Building High Performance AppsMSFT Dumaguete 061616 - Building High Performance Apps
MSFT Dumaguete 061616 - Building High Performance Apps
 
takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01
takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01
takingapexandvisualforceaboveandbeyondv1-141120224449-conversion-gate01
 
Micro-ORM Introduction - Don't overcomplicate
Micro-ORM Introduction - Don't overcomplicateMicro-ORM Introduction - Don't overcomplicate
Micro-ORM Introduction - Don't overcomplicate
 
PL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme PerformancePL/SQL New and Advanced Features for Extreme Performance
PL/SQL New and Advanced Features for Extreme Performance
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Incremental View Maintenance with Coral, DBT, and Iceberg

  • 1. Incremental View Maintenance with Coral, DBT, and Iceberg May 2023
  • 2. Modern Data Lake Architectures • Compute Engines • Process large amounts of data • Orchestrators • Execute jobs on a schedule • Or on data availability • ETL tools • To implement, test, and build data workflows • Tables • Continuously updated
  • 3. Modern Data Lake Growth Pains • Large number of jobs • E.g, SQL workloads • Workload scanning/computing data from scratch each time • Becomes more of a problem as the data grows in volume. SELECT posts.post_id, COUNT(likes.user_id) AS total_likes FROM posts LEFT JOIN likes ON posts.post_id = likes.post_id GROUP BY posts.post_id; SELECT AVG(num_comments) AS avg_comments_per_user FROM ( SELECT users.user_id, COUNT(comments.comment_id ) AS num_comments FROM users INNER JOIN comments ON users.user_id = comments.user_id GROUP BY users.user_id ) AS user_comments; SELECT COUNT(DISTINCT likes.user_id) AS num_users_liked_and_comme nted FROM likes INNER JOIN comments ON likes.post_id = comments.post_id AND likes.user_id = comments.user_id; SELECT sender_id, COUNT(*) AS num_messages_sent FROM messages GROUP BY sender_id; SELECT users.user_id, COUNT(friendships.friend_ id) AS num_friends FROM users INNER JOIN friendships ON users.user_id = friendships.user_id GROUP BY users.user_id ORDER BY num_friends DESC LIMIT 10;
  • 4. What if we can maintain tables incrementally? Update tables only with the changes! • Lower compute cost • Lower latency • More update-to-date insights/models • Improved UX • Focus on writing the logic, not the incremental mechanics • Declare full DAG using just SQL
  • 5. Incremental Compute Made Easy With Coral, Iceberg, and DBT • DBT • For capturing transformations • Coral • For incremental maintenance logic • Iceberg • SnapshotAPIs and Incrementalscan
  • 6. DBT Overview What is DBT? • Open-source data transformation tool (ETL) that enables teams to quickly build complex data pipelines Image from getdbt.com
  • 7. DBT Overview DBT Native MaterializationProperties: Table • Model rebuilt as table on each run (using CREATE TABLE AS) • Takes a long time to rebuild my_dbt_model.sql
  • 8. DBT Overview DBT Native MaterializationProperties: Incremental • Inserts or updates records in the built table on a manual run when the source table changes • Requires extra wrappers and configurations, where users must specify how to filter rows • Described as an “advanced usage” of DBT my_dbt_model.sql
  • 9. DBT Overview DBT Native MaterializationProperties: Incremental • Inserts or updates records in the built table when the source table changes • Requires extra wrappers and configurations, where users must specify how to filter rows • Described as an “advanced usage” of DBT my_dbt_model.sql
  • 10. Desired User Experience New MaterializationMode: Incremental Maintenance • Incremental maintenance functionality with no extra code necessary • One simple configuration change from `table` materialization mode my_dbt_model.sql
  • 12. Calculating Incremental Queries Simple Join Example id product_name 1 LinkedIn Learning 2 LinkedIn Premium id product_price 2 $6 inventory prices SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id
  • 13. Calculating Incremental Queries Simple Join Example id product_name 1 LinkedIn Learning 2 LinkedIn Premium id product_price 2 $6 inventory prices SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id product_name product_price LinkedIn Premium $6 t1
  • 14. Calculating Incremental Queries Simple Join Example: Drop and Rebuild id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter id product_price 2 $6 1 $3 3 $40 inventory prices SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id product_name product_price LinkedIn Premium $6 t1
  • 15. Calculating Incremental Queries Simple Join Example: Drop and Rebuild inventory prices SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter id product_price 2 $6 1 $3 3 $40
  • 16. Calculating Incremental Queries Simple Join Example: Drop and Rebuild inventory prices SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id t2 product_name product_price LinkedIn Premium $6 LinkedIn Learning $3 LinkedIn Recruiter $40 id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter id product_price 2 $6 1 $3 3 $40
  • 17. Calculating Incremental Queries Simple Join Example: Incremental Maintenance SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id product_name product_price LinkedIn Premium $6 t1 inventory prices id product_name 1 LinkedIn Learning 2 LinkedIn Premium id product_price 2 $6
  • 18. Calculating Incremental Queries Simple Join Example: Incremental Maintenance SELECT product_name, product_price FROM inventory JOIN prices_delta ON inventory.id = prices_delta.id product_name product_price LinkedIn Premium $6 t1 id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter id product_price 2 $6 1 $3 3 $40 inventory prices Δt𝛼 product_name product_price LinkedIn Learning $3
  • 19. Calculating Incremental Queries Simple Join Example: Incremental Maintenance SELECT product_name, product_price FROM inventory_delta JOIN prices ON inventory_delta.id = prices.id product_name product_price LinkedIn Premium $6 LinkedIn Learning $3 t1 + Δt𝛼 Δtβ id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter id product_price 2 $6 1 $3 3 $40 inventory prices product_name product_price
  • 20. Calculating Incremental Queries Simple Join Example: Incremental Maintenance product_name product_price LinkedIn Premium $6 LinkedIn Learning $3 t1 + Δt𝛼 + Δtβ Δt𝛄 id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter id product_price 2 $6 1 $3 3 $40 inventory prices product_name product_price LinkedIn Recruiter $40 SELECT product_name, product_price FROM inventory_delta JOIN prices_delta ON inventory_delta.id = prices_delta.id
  • 21. Calculating Incremental Queries Simple Join Example: Incremental Maintenance INSERT INTO t1 (SELECT product_name, product_price FROM inventory JOIN prices_delta ON inventory.id = prices_delta.id UNION ALL SELECT product_name, product_price FROM inventory_delta JOIN prices ON inventory_delta.id = prices.id UNION ALL SELECT product_name, product_price FROM inventory_delta JOIN prices_delta ON inventory_delta.id = prices_delta.id) product_name product_price LinkedIn Premium $6 LinkedIn Learning $3 LinkedIn Recruiter $40 t1 + Δt𝛼 + Δtβ + Δt𝛄 Incremental Query Δt𝛼 Δtβ Δt𝛄
  • 22. Coral
  • 23. Overview What is Coral? • Translation, analysis, and query rewrite engine • Open source since 2020 WIP Future Dialect Future Dialect
  • 24. Coral IR • Captures query semantics using standardized operators • Based on Apache Calcite • Two semantically equivalent representations: ❑ Coral IR – AST o Captures query semantics at the syntax tree layer o Extends Calcite's SqlNode representation o Use cases: SQL translations ❑ Coral IR – Logical Plan o Captures query semantics at the logical plan layer o Extends Calcite's RelNode representation o Use cases: Query optimization, query rewrites, dynamic data masking
  • 25. Coral IR - AST • Captures query semantics using standardized operators at syntax tree level Image generatedby Coral-Visualization Trino SQL: SELECT * FROM test.foo JOIN test. bar ON a = c WHERE array_element[1] = 1 AND strpos(a, 'foo') > 0 Spark SQL: SELECT * FROM test.foo JOIN test. bar ON a = c WHERE b[0] = 1 AND instr(a, 'foo') > 0
  • 26. Coral IR – Logical Plan • Extends Apache Calcite’s Relational Algebra Expressions • Captures query semantics using standardized operators at logical plan level Image generatedby Coral-Visualization Trino SQL: SELECT * FROM test.foo JOIN test. bar ON a = c WHERE array_element[1] = 1 AND strpos(a, 'foo') > 0 Spark SQL: SELECT * FROM test.foo JOIN test. bar ON a = c WHERE b[0] = 1 AND instr(a, 'foo') > 0
  • 30. Coral-Incremental TransformationOverview SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id SELECT product_name, product_price FROM inventory JOIN prices_delta ON inventory.id = prices_delta.id UNION ALL SELECT product_name, product_price FROM inventory_delta JOIN prices ON inventory_delta.id = prices.id UNION ALL SELECT product_name, product_price FROM inventory_delta JOIN prices_delta ON inventory_delta.id = prices_delta.id
  • 31. Coral-Incremental SQL to Coral IR Input Query SELECT product_name, product_price FROM inventory JOIN prices ON inventory.id = prices.id
  • 36. Coral-Incremental Coral IR to SQL Incremental Query SELECT product_name, product_price FROM inventory JOIN prices_delta ON inventory.id = prices_delta.id UNION ALL SELECT product_name, product_price FROM inventory_delta JOIN prices ON inventory_delta.id = prices.id UNION ALL SELECT product_name, product_price FROM inventory_delta JOIN prices_delta ON inventory_delta.id = prices_delta.id
  • 37. Coral-Service Overview • Spring boot service that exposes REST APIs to allow interaction with Coral, without coming from an engine • /api/incremental/rewrite • Endpoint that handles pre and post processing between query and Coral IR representations
  • 43. Desired State • End-to-end framework to materialize frequently invoked views and efficiently update records upon changes in base relations ✔️ Efficient Updates Compute and apply incremental changes, ratherthan re-computing on each invocation. Low Friction Adoption Provide an end-to-end framework for users to seamlessly adopt incremental maintenance functionality while making few modifications to their existing systems.
  • 45. Coral-Dbt User Perspective • Users can utilize incremental maintenance functionality with their models out-of-the-box with the coral- dbt package my_dbt_model.sql (initial configuration)
  • 46. Coral-Dbt User Perspective • Users can utilize incremental maintenance functionality with their models out-of-the-box with the coral- dbt package my_dbt_model.sql (with incremental maintenance)
  • 47. Coral-Dbt Inside the `incremental_maintenance` MaterializationMode 1. Makes a POST request to the Coral service endpoint /api/incremental/rewrite, passing the input SQL 2. Generates Scala code for incremental maintenance logic 3. Executes the generated Spark Scala code
  • 48. Coral-Dbt Inside the `incremental_maintenance` MaterializationMode 1. Makes a POST request to the Coral service endpoint /api/incremental/rewrite, passing the input SQL 2. Generates Spark Scala code for incremental maintenance logic 3. Executes the generated Spark Scala code
  • 49. Coral-Dbt: Leveraging Iceberg Useful Iceberg Properties • High-performance format for large analytics tables • Table metadata tracks schema, partitioning configs, and snapshots • Enables time travel and incremental reads via Spark Scala → ingredients for incremental maintenance
  • 50. Coral-Dbt: Code Generation Retrieving Snapshot Ids id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter inventory tnow (end) tnow – 1 (start) > val start_snapshot_id = grab_snapshot_id_from_previous_run() > val end_snapshot_id = grab_latest_snapshot_id() • For each table in the query: • Grab timestamps tnow (end_snapshot_id) and tnow-1 (start_snapshot_id)
  • 51. Coral-Dbt: Code Generation Creating Temp Views • For each table in the query: • Create temporary views representing the original table and the additions inventory inventory_delta inventory > val df = load("inventory") > val inventory = df.snapshotTo(start_snapshot_id) .createTempView() > val inventory_delta = df.snapshotFrom(start_snapshot_id) .snapshotTo(end_snapshot_id) .createTempView() id product_name 1 LinkedIn Learning 2 LinkedIn Premium 3 LinkedIn Recruiter
  • 52. Coral-Dbt: Code Generation Executing Incremental Query and Updating MaterializedTable > val query_response = spark.sql(incremental_maintenance_sql) > query_response.appendToTable("my_join_output") product_name product_price LinkedIn Premium $6 LinkedIn Learning $3 LinkedIn Recruiter $40 t2 = t1 + query_response product_name product_price LinkedIn Premium $6 t1
  • 53. Desired State • End-to-end framework to materialize frequently invoked views and efficiently update records upon changes in base relations ✔️ Efficient Updates Compute and apply incremental changes, ratherthan re-computing on each invocation. ✔️ Low FrictionAdoption Provide an end-to-end framework for users to seamlessly adopt incremental maintenance functionalitywhile making few modifications to theirexisting systems.
  • 54. Next Steps • Expand supported queries • Aggregates, outer joins • Support updates and deletes • Build cost-based model to identify optimal incremental maintenance plans
  • 55. References • Coral: https://github.com/linkedin/coral • Incremental Maintenance Materialization Mode: https://github.com/linkedin/coral/tree/master/coral-dbt • Incremental rewrite: https://github.com/linkedin/coral/tree/master/coral-incremental