SlideShare a Scribd company logo
1 of 100
The Future of Data Engineering
Chris Riccomini / WePay / @criccomini / QCon SF / 2019-11-12
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
data-engineering-pipelines-
warehouses/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
This talk
• Context
• Stages
• Architecture
Context
Me
• WePay, LinkedIn, PayPal
• Data infrastructure, data engineering, service infrastructure, data science
• Kafka, Airflow, BigQuery, Samza, Hadoop, Azkaban, Teradata
Me
• WePay, LinkedIn, PayPal
• Data infrastructure, data engineering, service infrastructure, data science
• Airflow, BigQuery, Kafka, Samza, Hadoop, Azkaban, Teradata
Me
• WePay, LinkedIn, PayPal
• Data infrastructure, data engineering, service infrastructure, data science
• Airflow, BigQuery, Kafka, Samza, Hadoop, Azkaban, Teradata
Me
• WePay, LinkedIn, PayPal
• Data infrastructure, data engineering, service infrastructure, data science
• Airflow, BigQuery, Kafka, Samza, Hadoop, Azkaban, Teradata
Data engineering?
A data engineer’s job is to help an organization
move and process data
“…data engineers build tools, infrastructure, frameworks, and
services.”
-- Maxime Beauchemin, The Rise of the Data Engineer
Why?
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
You might be ready for a data warehouse if…
• You have no data warehouse
• You have a monolithic architecture
• You need a data warehouse up and running yesterday
• Data engineering isn’t your full time job
Stage 0: None
DBMonolith
Stage 0: None
DBMonolith
WePay circa 2014
MySQL
PHP
Monolith
Problems
• Queries began timing out
• Users were impacting each other
• MySQL was missing complex analytical SQL functions
• Report generation was breaking
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
You might be ready for batch if…
• You have a monolithic architecture
• Data engineering is your part-time job
• Queries are timing out
• Exceeding DB capacity
• Need complex analytical SQL functions
• Need reports, charts, and business intelligence
Stage 1: Batch
DBMonolith Scheduler DWH
WePay circa 2016
MySQL
PHP
Monolith
Airflow BQ
Problems
• Large number of Airflow jobs for loading all tables
• Missing and inaccurate create_time and modify_time
• DBA operations impacting pipeline
• Hard deletes weren’t propagating
• MySQL replication latency was causing data quality issues
• Periodic loads cause occasional MySQL timeouts
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
You might be ready for realtime if…
• Loads are taking too long
• Pipeline is no longer stable
• Many complicated workflows
• Data latency is becoming an issue
• Data engineering is your fulltime job
• You already have Apache Kafka in your organization
Stage 2: Realtime
DBMonolith
Streaming
Platform DWH
WePay circa 2017
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
MySQLService Debezium
MySQLService Debezium
WePay circa 2017
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
MySQLService Debezium
MySQLService Debezium
WePay circa 2017
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
MySQLService Debezium
MySQLService Debezium
Change data capture?
…an approach to data integration that is based on
the identification, capture and delivery of the
changes made to enterprise data sources.
https://en.wikipedia.org/wiki/Change_data_capture
Debezium sources
• MongoDB
• MySQL
• PostgreSQL
• SQL Server
• Oracle (Incubating)
• Cassandra (Incubating)
WePay circa 2017
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
MySQLService Debezium
MySQLService Debezium
Kafka Connect BigQuery
• Open source connector that WePay wrote
• Stream data from Apache Kafka to Google BigQuery
• Supports GCS loads
• Supports realtime streaming inserts
• Automatic table schema updates
Problems
• Pipeline for Datastore was still on Airflow
• No pipeline at all for Cassandra or Bigtable
• BigQuery needed logging data
• Elastic search needed data
• Graph DB needed data
https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
You might be ready for integration if…
• You have microservices
• You have a diverse database ecosystem
• You have many specialized derived data systems
• You have a team of data engineers
• You have a mature SRE organization
Stage 3: Integration
DBService
Streaming
Platform DWH
NoSQLService
New
SQL
Service Graph
DB
Search
WePay circa 2019
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
CassandraService Debezium
MySQLService Debezium
Graph
DB
Waltz
Service
KCW
Service
Service
WePay circa 2019
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
CassandraService Debezium
MySQLService Debezium
Graph
DB
Waltz
Service
KCW
Service
Service
WePay circa 2019
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
CassandraService Debezium
MySQLService Debezium
Graph
DB
Waltz
Service
KCW
Service
Service
WePay circa 2019
Kafka BQKCBQMySQL
PHP
Monolith
Debezium
CassandraService Debezium
MySQLService Debezium
Graph
DB
Waltz
Service
KCW
Service
Service
Metcalfe’s law
Problems
• Add new channel to replica MySQL DB
• Create and configure Kafka topics
• Add new Debezium connector to Kafka connect
• Create destination dataset in BigQuery
• Add new KCBQ connector to Kafka connect
• Create BigQuery views
• Configure data quality checks for new tables
• Grant access to BigQuery dataset
• Deploy stream processors or workflows
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
You might be ready for automation if…
• Your SREs can’t keep up
• You’re spending a lot of time on manual toil
• You don’t have time for the fun stuff
Realtime Data Integration
Stage 4: Automation
DBService
Streaming
Platform DWH
NoSQLService
New
SQL
Service Graph
DB
Search
Automated Operations
Orchestration Monitoring Configuration …
Automated Data Management
Data Catalog RBAC/IAM/ACL DLP …
Automated Operations
“If a human operator needs to touch your system
during normal operations, you have a bug.”
-- Carla Geisser, Google SRE
Normal operations?
• Add new channel to replica MySQL DB
• Create and configure Kafka topics
• Add new Debezium connector to Kafka connect
• Create destination dataset in BigQuery
• Add new KCBQ connector to Kafka connect
• Create BigQuery views
• Configure data quality checks for new tables
• Granting access
• Deploying stream processors or workflows
Automated operations
• Terraform
• Ansible
• Helm
• Salt
• CloudFormation
• Chef
• Puppet
• Spinnaker
Terraform
provider "kafka" {
bootstrap_servers = ["localhost:9092"]
}
resource "kafka_topic" "logs" {
name = "systemd_logs"
replication_factor = 2
partitions = 100
config = {
"segment.ms" = "20000"
"cleanup.policy" = "compact"
}
}
Terraform
provider "kafka-connect" {
url = "http://localhost:8083"
}
resource "kafka-connect_connector" "sqlite-sink" {
name = "test-sink"
config = {
"name" = "test-sink"
"connector.class" = "io.confluent.connect.jdbc.JdbcSinkConnector"
"tasks.max" = "1"
"topics" = "orders"
"connection.url" = "jdbc:sqlite:test.db"
"auto.create" = "true"
}
}
But we were doing this… why so much toil?
• We had Terraform and Ansible
• We were on the cloud
• We had BigQuery scripts and tooling
Spending time on data management
• Who gets access to this data?
• How long can this data be persisted?
• Is this data allowed in this system?
• Which geographies must data be persisted in?
• Should columns be masked?
Regulation is coming
Photo by Darren Halstead
Regulation is coming here
GDPR, CCPA, PCI, HIPAA, SOX, SHIELD, …
Photo by Darren Halstead
Automated Data Management
Set up a data catalog
• Location
• Schema
• Ownership
• Lineage
• Encryption
• Versioning
Realtime Data Integration
Stage 4: Automation
DBService
Streaming
Platform DWH
NoSQLService
New
SQL
Service Graph
DB
Search
Automated Operations
Orchestration Monitoring Configuration …
Automated Data Management
Data Catalog RBAC/IAM/ACL DLP …
Configure your access
• RBAC
• IAM
• ACL
Configure your policies
• Role based access controls
• Identity access management
• Access control lists
Kafka ACLs with Terraform
provider "kafka" {
bootstrap_servers = ["localhost:9092"]
ca_cert = file("../secrets/snakeoil-ca-1.crt")
client_cert = file("../secrets/kafkacat-ca1-signed.pem")
client_key = file("../secrets/kafkacat-raw-private-key.pem")
skip_tls_verify = true
}
resource "kafka_acl" "test" {
resource_name = "syslog"
resource_type = "Topic"
acl_principal = "User:Alice"
acl_host = "*"
acl_operation = "Write"
acl_permission_type = "Deny"
}
Automate management
• New user access
• New data access
• Service account access
• Temporary access
• Unused access
Detect violations
• Auditing
• Data loss prevention
Detecting sensitive data
{
"item":{
"value":"My phone number is (415) 555-0890"
},
"inspectConfig":{
"includeQuote":true,
"minLikelihood":"POSSIBLE",
"infoTypes":{
"name":"PHONE_NUMBER"
}
}
}
{
"result":{
"findings":[
{
"quote":"(415) 555-0890",
"infoType":{
"name":"PHONE_NUMBER"
},
"likelihood":"VERY_LIKELY",
"location":{
"byteRange":{
"start":"19",
"end":"33"
},
},
}
]
}
}
Progress
• Users can find the data that they need
• Automated data management and operations
Problems
• Data engineering still manages configuration and deployment
Six stages of data pipeline maturity
• Stage 0: None
• Stage 1: Batch
• Stage 2: Realtime
• Stage 3: Integration
• Stage 4: Automation
• Stage 5: Decentralization
You might be ready for decentralization if…
• You have a fully automated realtime data pipeline
• People still come to you to get data loaded
If we have an automated data pipeline and data warehouse,
do we need a single team to manage this?
Realtime Data Integration
Stage 5: Decentralization
DBService
Streaming
Platform
NoSQLService
New
SQL
Service
Graph
DB
Search
Automated Operations
Orchestration Monitoring Configuration …
Automated Data Management
Data Catalog RBAC/IAM/ACL DLP …
DWH
DWH
From monolith to microservices microwarehouses
Partial decentralization
• Raw tools are exposed to other engineering teams
• Requires Git, YAML, JSON, pull requests, terraform commands, etc.
Full decentralization
• Polished tools are exposed to everyone
• Security and compliance manage access and policy
• Data engineering manages data tooling and infrastructure
• Everyone manages data pipelines and data warehouses
Realtime Data Integration
Modern Data Pipeline
DBService
Streaming
Platform
NoSQLService
New
SQL
Service
Graph
DB
Search
Automated Operations
Orchestration Monitoring Configuration …
Automated Data Management
Data Catalog RBAC/IAM/ACL DLP …
DWH
DWH
Thanks!
(..and we’re hiring)
🙏
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
data-engineering-pipelines-
warehouses/

More Related Content

What's hot

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 

What's hot (20)

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data engineering
Data engineeringData engineering
Data engineering
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Data mesh
Data meshData mesh
Data mesh
 

Similar to The Future of Data Engineering Pipelines and Architectures

The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFChris Riccomini
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...confluent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Emerson Eduardo Rodrigues Von Staffen
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...Amazon Web Services
 
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Vietnam Open Infrastructure User Group
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisVMware Tanzu
 
Introducing Venice
Introducing VeniceIntroducing Venice
Introducing VeniceYan Yan
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of confluent
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113longJeff Harris
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113longJeff Harris
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservicesconfluent
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !! Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !! Karthik Babu Sekar
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Mydbops
 
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Couchbase Chennai Meetup:  Developing with Couchbase- made easyCouchbase Chennai Meetup:  Developing with Couchbase- made easy
Couchbase Chennai Meetup: Developing with Couchbase- made easyKarthik Babu Sekar
 

Similar to The Future of Data Engineering Pipelines and Architectures (20)

The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia Davis
 
Introducing Venice
Introducing VeniceIntroducing Venice
Introducing Venice
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113long
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113long
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !! Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Couchbase Chennai Meetup:  Developing with Couchbase- made easyCouchbase Chennai Meetup:  Developing with Couchbase- made easy
Couchbase Chennai Meetup: Developing with Couchbase- made easy
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

The Future of Data Engineering Pipelines and Architectures

  • 1. The Future of Data Engineering Chris Riccomini / WePay / @criccomini / QCon SF / 2019-11-12
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ data-engineering-pipelines- warehouses/
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4. This talk • Context • Stages • Architecture
  • 6. Me • WePay, LinkedIn, PayPal • Data infrastructure, data engineering, service infrastructure, data science • Kafka, Airflow, BigQuery, Samza, Hadoop, Azkaban, Teradata
  • 7. Me • WePay, LinkedIn, PayPal • Data infrastructure, data engineering, service infrastructure, data science • Airflow, BigQuery, Kafka, Samza, Hadoop, Azkaban, Teradata
  • 8. Me • WePay, LinkedIn, PayPal • Data infrastructure, data engineering, service infrastructure, data science • Airflow, BigQuery, Kafka, Samza, Hadoop, Azkaban, Teradata
  • 9. Me • WePay, LinkedIn, PayPal • Data infrastructure, data engineering, service infrastructure, data science • Airflow, BigQuery, Kafka, Samza, Hadoop, Azkaban, Teradata
  • 11. A data engineer’s job is to help an organization move and process data
  • 12. “…data engineers build tools, infrastructure, frameworks, and services.” -- Maxime Beauchemin, The Rise of the Data Engineer
  • 13. Why?
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 20. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 21. You might be ready for a data warehouse if… • You have no data warehouse • You have a monolithic architecture • You need a data warehouse up and running yesterday • Data engineering isn’t your full time job
  • 25. Problems • Queries began timing out • Users were impacting each other • MySQL was missing complex analytical SQL functions • Report generation was breaking
  • 26. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 27. You might be ready for batch if… • You have a monolithic architecture • Data engineering is your part-time job • Queries are timing out • Exceeding DB capacity • Need complex analytical SQL functions • Need reports, charts, and business intelligence
  • 28. Stage 1: Batch DBMonolith Scheduler DWH
  • 30. Problems • Large number of Airflow jobs for loading all tables • Missing and inaccurate create_time and modify_time • DBA operations impacting pipeline • Hard deletes weren’t propagating • MySQL replication latency was causing data quality issues • Periodic loads cause occasional MySQL timeouts
  • 31. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 32. You might be ready for realtime if… • Loads are taking too long • Pipeline is no longer stable • Many complicated workflows • Data latency is becoming an issue • Data engineering is your fulltime job • You already have Apache Kafka in your organization
  • 34.
  • 35. WePay circa 2017 Kafka BQKCBQMySQL PHP Monolith Debezium MySQLService Debezium MySQLService Debezium
  • 36. WePay circa 2017 Kafka BQKCBQMySQL PHP Monolith Debezium MySQLService Debezium MySQLService Debezium
  • 37. WePay circa 2017 Kafka BQKCBQMySQL PHP Monolith Debezium MySQLService Debezium MySQLService Debezium
  • 39. …an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources. https://en.wikipedia.org/wiki/Change_data_capture
  • 40. Debezium sources • MongoDB • MySQL • PostgreSQL • SQL Server • Oracle (Incubating) • Cassandra (Incubating)
  • 41.
  • 42. WePay circa 2017 Kafka BQKCBQMySQL PHP Monolith Debezium MySQLService Debezium MySQLService Debezium
  • 43. Kafka Connect BigQuery • Open source connector that WePay wrote • Stream data from Apache Kafka to Google BigQuery • Supports GCS loads • Supports realtime streaming inserts • Automatic table schema updates
  • 44. Problems • Pipeline for Datastore was still on Airflow • No pipeline at all for Cassandra or Bigtable • BigQuery needed logging data • Elastic search needed data • Graph DB needed data
  • 46. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 47. You might be ready for integration if… • You have microservices • You have a diverse database ecosystem • You have many specialized derived data systems • You have a team of data engineers • You have a mature SRE organization
  • 48. Stage 3: Integration DBService Streaming Platform DWH NoSQLService New SQL Service Graph DB Search
  • 49. WePay circa 2019 Kafka BQKCBQMySQL PHP Monolith Debezium CassandraService Debezium MySQLService Debezium Graph DB Waltz Service KCW Service Service
  • 50. WePay circa 2019 Kafka BQKCBQMySQL PHP Monolith Debezium CassandraService Debezium MySQLService Debezium Graph DB Waltz Service KCW Service Service
  • 51. WePay circa 2019 Kafka BQKCBQMySQL PHP Monolith Debezium CassandraService Debezium MySQLService Debezium Graph DB Waltz Service KCW Service Service
  • 52.
  • 53. WePay circa 2019 Kafka BQKCBQMySQL PHP Monolith Debezium CassandraService Debezium MySQLService Debezium Graph DB Waltz Service KCW Service Service
  • 55.
  • 56. Problems • Add new channel to replica MySQL DB • Create and configure Kafka topics • Add new Debezium connector to Kafka connect • Create destination dataset in BigQuery • Add new KCBQ connector to Kafka connect • Create BigQuery views • Configure data quality checks for new tables • Grant access to BigQuery dataset • Deploy stream processors or workflows
  • 57.
  • 58. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 59. You might be ready for automation if… • Your SREs can’t keep up • You’re spending a lot of time on manual toil • You don’t have time for the fun stuff
  • 60. Realtime Data Integration Stage 4: Automation DBService Streaming Platform DWH NoSQLService New SQL Service Graph DB Search Automated Operations Orchestration Monitoring Configuration … Automated Data Management Data Catalog RBAC/IAM/ACL DLP …
  • 62. “If a human operator needs to touch your system during normal operations, you have a bug.” -- Carla Geisser, Google SRE
  • 63. Normal operations? • Add new channel to replica MySQL DB • Create and configure Kafka topics • Add new Debezium connector to Kafka connect • Create destination dataset in BigQuery • Add new KCBQ connector to Kafka connect • Create BigQuery views • Configure data quality checks for new tables • Granting access • Deploying stream processors or workflows
  • 64. Automated operations • Terraform • Ansible • Helm • Salt • CloudFormation • Chef • Puppet • Spinnaker
  • 65. Terraform provider "kafka" { bootstrap_servers = ["localhost:9092"] } resource "kafka_topic" "logs" { name = "systemd_logs" replication_factor = 2 partitions = 100 config = { "segment.ms" = "20000" "cleanup.policy" = "compact" } }
  • 66. Terraform provider "kafka-connect" { url = "http://localhost:8083" } resource "kafka-connect_connector" "sqlite-sink" { name = "test-sink" config = { "name" = "test-sink" "connector.class" = "io.confluent.connect.jdbc.JdbcSinkConnector" "tasks.max" = "1" "topics" = "orders" "connection.url" = "jdbc:sqlite:test.db" "auto.create" = "true" } }
  • 67. But we were doing this… why so much toil? • We had Terraform and Ansible • We were on the cloud • We had BigQuery scripts and tooling
  • 68. Spending time on data management • Who gets access to this data? • How long can this data be persisted? • Is this data allowed in this system? • Which geographies must data be persisted in? • Should columns be masked?
  • 69. Regulation is coming Photo by Darren Halstead
  • 70. Regulation is coming here GDPR, CCPA, PCI, HIPAA, SOX, SHIELD, … Photo by Darren Halstead
  • 72. Set up a data catalog • Location • Schema • Ownership • Lineage • Encryption • Versioning
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78. Realtime Data Integration Stage 4: Automation DBService Streaming Platform DWH NoSQLService New SQL Service Graph DB Search Automated Operations Orchestration Monitoring Configuration … Automated Data Management Data Catalog RBAC/IAM/ACL DLP …
  • 79. Configure your access • RBAC • IAM • ACL
  • 80. Configure your policies • Role based access controls • Identity access management • Access control lists
  • 81.
  • 82. Kafka ACLs with Terraform provider "kafka" { bootstrap_servers = ["localhost:9092"] ca_cert = file("../secrets/snakeoil-ca-1.crt") client_cert = file("../secrets/kafkacat-ca1-signed.pem") client_key = file("../secrets/kafkacat-raw-private-key.pem") skip_tls_verify = true } resource "kafka_acl" "test" { resource_name = "syslog" resource_type = "Topic" acl_principal = "User:Alice" acl_host = "*" acl_operation = "Write" acl_permission_type = "Deny" }
  • 83. Automate management • New user access • New data access • Service account access • Temporary access • Unused access
  • 84. Detect violations • Auditing • Data loss prevention
  • 85.
  • 86. Detecting sensitive data { "item":{ "value":"My phone number is (415) 555-0890" }, "inspectConfig":{ "includeQuote":true, "minLikelihood":"POSSIBLE", "infoTypes":{ "name":"PHONE_NUMBER" } } } { "result":{ "findings":[ { "quote":"(415) 555-0890", "infoType":{ "name":"PHONE_NUMBER" }, "likelihood":"VERY_LIKELY", "location":{ "byteRange":{ "start":"19", "end":"33" }, }, } ] } }
  • 87. Progress • Users can find the data that they need • Automated data management and operations
  • 88. Problems • Data engineering still manages configuration and deployment
  • 89. Six stages of data pipeline maturity • Stage 0: None • Stage 1: Batch • Stage 2: Realtime • Stage 3: Integration • Stage 4: Automation • Stage 5: Decentralization
  • 90. You might be ready for decentralization if… • You have a fully automated realtime data pipeline • People still come to you to get data loaded
  • 91. If we have an automated data pipeline and data warehouse, do we need a single team to manage this?
  • 92. Realtime Data Integration Stage 5: Decentralization DBService Streaming Platform NoSQLService New SQL Service Graph DB Search Automated Operations Orchestration Monitoring Configuration … Automated Data Management Data Catalog RBAC/IAM/ACL DLP … DWH DWH
  • 93. From monolith to microservices microwarehouses
  • 94.
  • 95.
  • 96. Partial decentralization • Raw tools are exposed to other engineering teams • Requires Git, YAML, JSON, pull requests, terraform commands, etc.
  • 97. Full decentralization • Polished tools are exposed to everyone • Security and compliance manage access and policy • Data engineering manages data tooling and infrastructure • Everyone manages data pipelines and data warehouses
  • 98. Realtime Data Integration Modern Data Pipeline DBService Streaming Platform NoSQLService New SQL Service Graph DB Search Automated Operations Orchestration Monitoring Configuration … Automated Data Management Data Catalog RBAC/IAM/ACL DLP … DWH DWH
  • 100. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ data-engineering-pipelines- warehouses/