SlideShare a Scribd company logo
1 of 56
Download to read offline
Building Event Streaming
Architectures on Scylla and
Confluent with Kafka
Tim Berglund
Senior Director of Developer Advocacy
Presenters
Alexys Jacob
CTO
Maheedhar Gunturu
Director of Technical Alliances
Othmane El Metioui
Chief Data Officer
Agenda
ᐩ Brief Intro to Scylla
ᐩ Scylla + Kafka at Numberly
ᐩ Change Data Capture in Scylla
ᐩ Streaming Data from Scylla to
Kafka
About ScyllaDB
4
• Reimagined the NoSQL database
• Close-to-the-hardware design, written in C++
• Open source, enterprise & DBaaS
• From the creators of KVM hypervisor
Winner Infoworld
Technology
of the Year
5
Grows with your business & your data
– Volume –
Multi-petabyte
– Throughput –
1 billion OPS
– Horizontal Scalability –
1,000-node cluster
– Availability –
1 to 10+ replicas
within a datacenter
– Consistent Latencies –
Low single-digit millisecond p99s
– Vertical Scalability –
1 to 416 vCPUs
– Unlimited –
Cell sizes and
partition width
– Consistency Options –
Eventual consistency
to linearizability
6
Used across industries
AdTech/MarTech
Multimedia Finance/FinTech Security
Ride-hailing/
Food Delivery
Social Retail Travel IoT Logistics/Transportation
Deployment options
Install in Your Datacenter
➔ Scylla Open Source
➔ Scylla Enterprise
➔ AWS Outposts
Deploy at a Cloud Provider
➔ Scylla Open Source
➔ Scylla Enterprise
Database as a Service
➔ Fully managed Scylla
clusters
➔ Bring Your Own Acct
(BYOA) option
On-Prem Cloud Hosted Scylla Cloud
7
Run on Kubernetes
➔ Manage with Scylla
Operator
Kubernetes
8
Scylla + Kafka at
Architectural choices and overview
9
At Numberly, we run bare-metal clusters
Scylla
3 clusters, with multi-datacenter
topology
• Staging
• Production web facing
• Production OLAP+OLTP
• RF=3 per DC
DELL hardware
• RAID0 NVMe
• up to 96 AMD cores per node
• up to 512GB RAM per node
Confluent Kafka
2 clusters, with active-active multi-datacenter
topology
• Staging
• Production
DELL hardware
• 6 brokers
12 TB SSD ( RAID0 )
2x 24 cores
64GB RAM
• 12 other nodes
Connect cluster, Schema Registry,
Zookeepers...
10
Scylla Cloud &
Confluent Cloud
TL;DR: The people behind the technology know better!
Cloud hosted solutions should be considered
depending on your infrastructure maturity and hosting
constraints.
Our experience shows that cloud providers such as
AWS always lag behind versions and provide poor
monitoring & alerting capabilities.
11
Scylla + Kafka at
Stack usage overview
Scylla
• Scylla Manager
• Scylla Monitoring
• Easy data expiration (TTL) on large time
windows (6+ months)
Combining Scylla and Confluent Kafka powers
Confluent Kafka
• Kafka Connect & Exporter
• Schema registry
• KSQL
• Home-made control center interface +
grafana
Started with in-house Kafka streams
and Python pipelines to propagate
data changes between Scylla & Kafka
12
Scylla
• Scylla Manager
• Scylla Monitoring
• Easy data expiration (TTL) on large time
windows (6+ months)
Confluent Kafka
• Kafka Connect & Exporter
• Schema registry
• KSQL
• Home-made control center interface +
grafana
Combining Scylla and Confluent Kafka powers
The Confluent certified CDC
connector will simplify our pipelines!
13
14
Scylla + Kafka at
Scylla is used as a low-latency remote state store
providing easy data expiry capabilities
to Kafka streams and pipelines (in & out)
Use case #1
Data pipeline enrichment
Scylla to the rescue in overcoming a too large
JOIN window for Kafka
15
Use case #1: how we did it before
The
Speaker’s
camera
displays
here
16
Numberly’s
web tracking
RabbitMQ exchange
Scylla 13+ months retention
High throughput writes
+
Low latency reads, expiring data
beanstalkd
Python
programs
write + read
Use case #1: our first attempt
The
Speaker’s
camera
displays
here
17
Numberly’s
web tracking
Kafka streams
Compacted topic
read
Kafka streams
write
Kafka connect
Ktable
redis
Scaling limitations of Kafka JOIN windows
• The retention of our source data enriched from Scylla is long (13+ months)
Data set size average of 150+GB per table, totaling 1.2+TB source data
• Multiple successive JOINs is heavy on Kafka on large datasets
Large state store on RocksDB memory issues caused Kubernetes pod OOM kills
Rebuilding the state store after Kafka streams restart ( pod ) was too long
Standby replicas comes with a cost for large state store
We turned to Scylla to be a remote, highly available, distributed state store!
18
Use case #1: how we do it today
The
Speaker’s
camera
displays
here
19
Numberly’s
web tracking
Kafka streams
Scylla 13+ months retention
High throughput writes
+
Low latency reads, expiring data
read
Kafka streams
write
Use case #1: takeaways
• Metrics
Metrics are important to a successful tuning (query response times, dataset size)
Use prometheus client instead of implementing kafka streams metrics
• Tuning
Size the number of partitions regarding your query metrics
Mind your time to recovery: max throughput capacity should be at least 3x the average
Add Query caching that should cover your average query time, no more to maximize consistency
Make sure you use a shard aware client for Scylla
The
Speaker’s
camera
displays
here
20
Use case #2
Scylla “most innovative use case” award
winning Synapse platform
Real time user segmentation
Kafka to the rescue in overcoming large
partitions
on Scylla for an OLAP statistical workload
21
Use case #2: Synapse platform
The
Speaker’s
camera
displays
here
22
Numberly’s web tracking
Synapse services
Business rules
Partners
calculation
Segmentation store
distribution
configuration
Kafka & Scylla: a complementary match
Where we chose Scylla over native Kafka
● Large number of tables with different sizes
○ Would create 10000+ topics if compact tables were used instead of Scylla
● TTL management on kafka compact table adds custom processing logic and complexity
○ Propagating Scylla expired data events stills adds complexity
○ We crave for expiration events in CDC
(https://github.com/scylladb/scylla/issues/8380)
● Leverage Scylla low latency reads capability to consume or enrich data at scale
Where Kafka saved the day for Scylla
● Compute real time stats on high cardinality data generated large partitions on Scylla
○ A user (partition key) is part of multiple segments (cluster key) = counting OK
○ A segment (partition key) has a great lot of users (cluster key) = large partition =
counting KO
23
Use case #2: takeaways
Define your table models to suit your queries
Forecast data volume on your model before using it
• Will it fit at scale in the technology you plan to use?
Mind large partitions on Scylla as it can damage your cluster performance
Kafka streams are great for on the fly aggregations
Sink your aggregated data to an external store to address multiple time spans lookups
• Interactive queries = hot real time
The
Speaker’s
camera
displays
here
24
25
Scylla + Kafka at
They play (very) well
together
Change Data Capture
(CDC) in Scylla
Maheedhar Gunturu
26
Change Data Capture (CDC)
Queries the history of changes made to your database.
• Asynchronously readable by downstream consumers.
• Available since Scylla Open Source 4.0 and now available in
Scylla Enterprise 2021.1.1
27
Use cases
• Application propagating state using various microservices for
use cases like IOT, retail , security, fraud detection, customer
360
• ETL
• Integrations, migrations and streaming transformations
• Alerting and monitoring
28
CDC in Scylla: enabled per table
• Single CDC log table per enabled table
• CDC log is co-located with base table
• Partitioning matches the base table
• Mirrored columns for preimage/delta records
• Every column record contains information about modification
operation and TTL
• Rows ordered by operation timestamp and batch sequence
• CDC data is TTL:ed to 24h (configurable)
29
Scylla’s CDC write path
+ Coordinator creates CDC log table
+ Writes and piggybacks on base table
+ Writes to same replica nodes.
+ While data size written is larger, the
number of writes requests does not
change.
INSERT INTO base_table(...)...
CQL
CDC write
30
CDC log rows
• Each mutation event generates one or more rows
Row keys
Changes per non-key column (delta) – optional
Pre-image (prior state) — optional
Post-image (current state of row) – optional
• CDC log write uses same consistency level as base write
Same data guarantees
31
Consume CDC streams aka read path
• CDC data is available through normal CQL
Easy to read raw streams
Already de-duplicated
All delta and pre image values are normal CQL data
Can consume without knowledge of server internals
• Layered approach
CDC core functionality relatively simple. Allows for more
sophisticated adaptors
■ Push models etc.
32
Consume CDC streams aka read path
+ CDC data is grouped into streams
+ Divides the token ring space
+ Each stream represents a tokenization “slot”
in current topology
+ Stream is log partition key
+ Stream chosen for given write based on base
table PK tokenization
+ CDC is also the basis for Alternator
Streams (DynamoDB API)
33
CDC in Scylla
+ Easy to integrate and consume
+ Plain CQL tables
+ Robust
+ Replicated in same way as the base data
+ Reasonable overhead
+ Coalesced writes and reads to same replica ranges
+ Overhead is comparable to adding/reading from a table
+ Does not overflow if consumer fails to act
+ Data is TTL:ed
34
Quick Poll
Streaming Data from
Scylla to Kafka
Tim Berglund
Streaming Data from Scylla
to Kafka
Tim Berglund
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent
Our goal is to create an event streaming platform
and put it at the heart of every company.
We do this with a platform that builds on Apache
Kafka, available on-prem and in Confluent Cloud.
Partition 0
Partition 1
Partition 2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Writing to Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 40
Partition 0
Partition 1
Partition 2
Partitioned Topic
Consumer A
Consumer B
Reading from Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 41
Partition 0
Partition 1
Partition 2
Partitioned Topic
Consumer A
Consumer B
Consumer A
Consumer A
Reading from Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka Connect
Scylla Source
Connector for Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka, Confluent, and Scylla
Scylla Source
connector for Kafka is
built on open source
Debezium
debezium.io
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Source Connector
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Sink Connector
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Syncing Scylla Clusters with Kafka
Use the Source and Sink connectors to exchange data
between separate Scylla clusters
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
1. Set up a Scylla Table with CDC
cqlsh> CREATE KEYSPACE ks WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)) WITH cdc = {'enabled': true};
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
2. Configure Kafka Scylla CDC Connector
name=ScyllaCDCConnector
connector.class=com.scylladb.cdc.debezium.connector.ScyllaConnector
scylla.name=MyCluster
scylla.cluster.ip.addresses=127.0.0.2:9042
scylla.table.names=ks.t
tasks.max=10
transforms=unwraptransforms.unwrap
type=io.debezium.transforms.ExtractNewRecordState
transforms.unwrap.drop.tombstones=false
transforms.unwrap.delete.handling.mode=none
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
heartbeat.interval.ms=1000
auto.create.topics.enable=true
How it Works
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
3. Test the Connector
cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (1, 5, 10);
cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (2, 6, 12);
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
4. How it looks
cqlsh> SELECT * FROM ks.t_scylla_cdc_log ;
cdc$stream_id | cdc$time | cdc$batch_seq_no | cdc$deleted_v | cdc$end_of_batch | cdc$operation | cdc$ttl | ck | pk | v
------------------------------------+--------------------------------------+------------------+---------------+------------------+---------------+---------+----+----+----
0xc72400000000000045715fd9dc0004c1 | a2130246-4048-11eb-5b81-9b458669aa11 | 0 | null | True | 2 | null | 5 | 1 | 10
0xd049555555555556e69dc1b6b4000581 | a6723136-4048-11eb-a309-3e76e3b340e7 | 0 | null | True | 2 | null | 6 | 2 | 12
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How it Works
4. Connector correctly replicates as JSON:
Kafka message number 1 (key):
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": true,
"field": "ck"
},
{
"type": "int32",
"optional": true,
"field": "pk"
}
],
"optional": false,
"name": "ks.t.Key"
},
"payload": {
"ck": 5,
"pk": 1
}
}
Kafka message number 1 (value):
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int32",
"optional": true,
"field": "ck"
},
{
"type": "int32",
"optional": true,
"field": "pk"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
[*snip* Etc.]
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Deltas Only...for now
• Currently only provides delta operations
• Preimage and postimage will be added in the future
• Will match nicely with “before” & “after” fields of
Debezium
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent Developer
developer.confluent.io
54
Learn Kafka!
Q&A
United States
545 Faber Place
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank you

More Related Content

What's hot

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingMichael Rainey
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuningAbishek V S
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time dataAmazon Web Services
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopDataWorks Summit
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 

What's hot (20)

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 

Similar to Streaming Data from Scylla to Kafka with CDC

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL MigrationScyllaDB
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayStrategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayScyllaDB
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of confluent
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 

Similar to Streaming Data from Scylla to Kafka with CDC (20)

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka WayStrategies For Migrating From SQL to NoSQL — The Apache Kafka Way
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Streaming Data from Scylla to Kafka with CDC

  • 1. Building Event Streaming Architectures on Scylla and Confluent with Kafka
  • 2. Tim Berglund Senior Director of Developer Advocacy Presenters Alexys Jacob CTO Maheedhar Gunturu Director of Technical Alliances Othmane El Metioui Chief Data Officer
  • 3. Agenda ᐩ Brief Intro to Scylla ᐩ Scylla + Kafka at Numberly ᐩ Change Data Capture in Scylla ᐩ Streaming Data from Scylla to Kafka
  • 4. About ScyllaDB 4 • Reimagined the NoSQL database • Close-to-the-hardware design, written in C++ • Open source, enterprise & DBaaS • From the creators of KVM hypervisor Winner Infoworld Technology of the Year
  • 5. 5 Grows with your business & your data – Volume – Multi-petabyte – Throughput – 1 billion OPS – Horizontal Scalability – 1,000-node cluster – Availability – 1 to 10+ replicas within a datacenter – Consistent Latencies – Low single-digit millisecond p99s – Vertical Scalability – 1 to 416 vCPUs – Unlimited – Cell sizes and partition width – Consistency Options – Eventual consistency to linearizability
  • 6. 6 Used across industries AdTech/MarTech Multimedia Finance/FinTech Security Ride-hailing/ Food Delivery Social Retail Travel IoT Logistics/Transportation
  • 7. Deployment options Install in Your Datacenter ➔ Scylla Open Source ➔ Scylla Enterprise ➔ AWS Outposts Deploy at a Cloud Provider ➔ Scylla Open Source ➔ Scylla Enterprise Database as a Service ➔ Fully managed Scylla clusters ➔ Bring Your Own Acct (BYOA) option On-Prem Cloud Hosted Scylla Cloud 7 Run on Kubernetes ➔ Manage with Scylla Operator Kubernetes
  • 8. 8 Scylla + Kafka at Architectural choices and overview
  • 9. 9 At Numberly, we run bare-metal clusters Scylla 3 clusters, with multi-datacenter topology • Staging • Production web facing • Production OLAP+OLTP • RF=3 per DC DELL hardware • RAID0 NVMe • up to 96 AMD cores per node • up to 512GB RAM per node Confluent Kafka 2 clusters, with active-active multi-datacenter topology • Staging • Production DELL hardware • 6 brokers 12 TB SSD ( RAID0 ) 2x 24 cores 64GB RAM • 12 other nodes Connect cluster, Schema Registry, Zookeepers...
  • 10. 10 Scylla Cloud & Confluent Cloud TL;DR: The people behind the technology know better! Cloud hosted solutions should be considered depending on your infrastructure maturity and hosting constraints. Our experience shows that cloud providers such as AWS always lag behind versions and provide poor monitoring & alerting capabilities.
  • 11. 11 Scylla + Kafka at Stack usage overview
  • 12. Scylla • Scylla Manager • Scylla Monitoring • Easy data expiration (TTL) on large time windows (6+ months) Combining Scylla and Confluent Kafka powers Confluent Kafka • Kafka Connect & Exporter • Schema registry • KSQL • Home-made control center interface + grafana Started with in-house Kafka streams and Python pipelines to propagate data changes between Scylla & Kafka 12
  • 13. Scylla • Scylla Manager • Scylla Monitoring • Easy data expiration (TTL) on large time windows (6+ months) Confluent Kafka • Kafka Connect & Exporter • Schema registry • KSQL • Home-made control center interface + grafana Combining Scylla and Confluent Kafka powers The Confluent certified CDC connector will simplify our pipelines! 13
  • 14. 14 Scylla + Kafka at Scylla is used as a low-latency remote state store providing easy data expiry capabilities to Kafka streams and pipelines (in & out)
  • 15. Use case #1 Data pipeline enrichment Scylla to the rescue in overcoming a too large JOIN window for Kafka 15
  • 16. Use case #1: how we did it before The Speaker’s camera displays here 16 Numberly’s web tracking RabbitMQ exchange Scylla 13+ months retention High throughput writes + Low latency reads, expiring data beanstalkd Python programs write + read
  • 17. Use case #1: our first attempt The Speaker’s camera displays here 17 Numberly’s web tracking Kafka streams Compacted topic read Kafka streams write Kafka connect Ktable redis
  • 18. Scaling limitations of Kafka JOIN windows • The retention of our source data enriched from Scylla is long (13+ months) Data set size average of 150+GB per table, totaling 1.2+TB source data • Multiple successive JOINs is heavy on Kafka on large datasets Large state store on RocksDB memory issues caused Kubernetes pod OOM kills Rebuilding the state store after Kafka streams restart ( pod ) was too long Standby replicas comes with a cost for large state store We turned to Scylla to be a remote, highly available, distributed state store! 18
  • 19. Use case #1: how we do it today The Speaker’s camera displays here 19 Numberly’s web tracking Kafka streams Scylla 13+ months retention High throughput writes + Low latency reads, expiring data read Kafka streams write
  • 20. Use case #1: takeaways • Metrics Metrics are important to a successful tuning (query response times, dataset size) Use prometheus client instead of implementing kafka streams metrics • Tuning Size the number of partitions regarding your query metrics Mind your time to recovery: max throughput capacity should be at least 3x the average Add Query caching that should cover your average query time, no more to maximize consistency Make sure you use a shard aware client for Scylla The Speaker’s camera displays here 20
  • 21. Use case #2 Scylla “most innovative use case” award winning Synapse platform Real time user segmentation Kafka to the rescue in overcoming large partitions on Scylla for an OLAP statistical workload 21
  • 22. Use case #2: Synapse platform The Speaker’s camera displays here 22 Numberly’s web tracking Synapse services Business rules Partners calculation Segmentation store distribution configuration
  • 23. Kafka & Scylla: a complementary match Where we chose Scylla over native Kafka ● Large number of tables with different sizes ○ Would create 10000+ topics if compact tables were used instead of Scylla ● TTL management on kafka compact table adds custom processing logic and complexity ○ Propagating Scylla expired data events stills adds complexity ○ We crave for expiration events in CDC (https://github.com/scylladb/scylla/issues/8380) ● Leverage Scylla low latency reads capability to consume or enrich data at scale Where Kafka saved the day for Scylla ● Compute real time stats on high cardinality data generated large partitions on Scylla ○ A user (partition key) is part of multiple segments (cluster key) = counting OK ○ A segment (partition key) has a great lot of users (cluster key) = large partition = counting KO 23
  • 24. Use case #2: takeaways Define your table models to suit your queries Forecast data volume on your model before using it • Will it fit at scale in the technology you plan to use? Mind large partitions on Scylla as it can damage your cluster performance Kafka streams are great for on the fly aggregations Sink your aggregated data to an external store to address multiple time spans lookups • Interactive queries = hot real time The Speaker’s camera displays here 24
  • 25. 25 Scylla + Kafka at They play (very) well together
  • 26. Change Data Capture (CDC) in Scylla Maheedhar Gunturu 26
  • 27. Change Data Capture (CDC) Queries the history of changes made to your database. • Asynchronously readable by downstream consumers. • Available since Scylla Open Source 4.0 and now available in Scylla Enterprise 2021.1.1 27
  • 28. Use cases • Application propagating state using various microservices for use cases like IOT, retail , security, fraud detection, customer 360 • ETL • Integrations, migrations and streaming transformations • Alerting and monitoring 28
  • 29. CDC in Scylla: enabled per table • Single CDC log table per enabled table • CDC log is co-located with base table • Partitioning matches the base table • Mirrored columns for preimage/delta records • Every column record contains information about modification operation and TTL • Rows ordered by operation timestamp and batch sequence • CDC data is TTL:ed to 24h (configurable) 29
  • 30. Scylla’s CDC write path + Coordinator creates CDC log table + Writes and piggybacks on base table + Writes to same replica nodes. + While data size written is larger, the number of writes requests does not change. INSERT INTO base_table(...)... CQL CDC write 30
  • 31. CDC log rows • Each mutation event generates one or more rows Row keys Changes per non-key column (delta) – optional Pre-image (prior state) — optional Post-image (current state of row) – optional • CDC log write uses same consistency level as base write Same data guarantees 31
  • 32. Consume CDC streams aka read path • CDC data is available through normal CQL Easy to read raw streams Already de-duplicated All delta and pre image values are normal CQL data Can consume without knowledge of server internals • Layered approach CDC core functionality relatively simple. Allows for more sophisticated adaptors ■ Push models etc. 32
  • 33. Consume CDC streams aka read path + CDC data is grouped into streams + Divides the token ring space + Each stream represents a tokenization “slot” in current topology + Stream is log partition key + Stream chosen for given write based on base table PK tokenization + CDC is also the basis for Alternator Streams (DynamoDB API) 33
  • 34. CDC in Scylla + Easy to integrate and consume + Plain CQL tables + Robust + Replicated in same way as the base data + Reasonable overhead + Coalesced writes and reads to same replica ranges + Overhead is comparable to adding/reading from a table + Does not overflow if consumer fails to act + Data is TTL:ed 34
  • 36. Streaming Data from Scylla to Kafka Tim Berglund
  • 37. Streaming Data from Scylla to Kafka Tim Berglund
  • 38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Confluent Our goal is to create an event streaming platform and put it at the heart of every company. We do this with a platform that builds on Apache Kafka, available on-prem and in Confluent Cloud.
  • 39. Partition 0 Partition 1 Partition 2 Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Writing to Kafka
  • 40. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 40 Partition 0 Partition 1 Partition 2 Partitioned Topic Consumer A Consumer B Reading from Kafka
  • 41. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 41 Partition 0 Partition 1 Partition 2 Partitioned Topic Consumer A Consumer B Consumer A Consumer A Reading from Kafka
  • 42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka Connect
  • 44. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka, Confluent, and Scylla Scylla Source connector for Kafka is built on open source Debezium debezium.io
  • 45. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Source Connector
  • 46. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Sink Connector
  • 47. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Syncing Scylla Clusters with Kafka Use the Source and Sink connectors to exchange data between separate Scylla clusters
  • 48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 1. Set up a Scylla Table with CDC cqlsh> CREATE KEYSPACE ks WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck)) WITH cdc = {'enabled': true};
  • 49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2. Configure Kafka Scylla CDC Connector name=ScyllaCDCConnector connector.class=com.scylladb.cdc.debezium.connector.ScyllaConnector scylla.name=MyCluster scylla.cluster.ip.addresses=127.0.0.2:9042 scylla.table.names=ks.t tasks.max=10 transforms=unwraptransforms.unwrap type=io.debezium.transforms.ExtractNewRecordState transforms.unwrap.drop.tombstones=false transforms.unwrap.delete.handling.mode=none key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=true value.converter.schemas.enable=true heartbeat.interval.ms=1000 auto.create.topics.enable=true How it Works
  • 50. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 3. Test the Connector cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (1, 5, 10); cqlsh> INSERT INTO ks.t(pk, ck, v) VALUES (2, 6, 12);
  • 51. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 4. How it looks cqlsh> SELECT * FROM ks.t_scylla_cdc_log ; cdc$stream_id | cdc$time | cdc$batch_seq_no | cdc$deleted_v | cdc$end_of_batch | cdc$operation | cdc$ttl | ck | pk | v ------------------------------------+--------------------------------------+------------------+---------------+------------------+---------------+---------+----+----+---- 0xc72400000000000045715fd9dc0004c1 | a2130246-4048-11eb-5b81-9b458669aa11 | 0 | null | True | 2 | null | 5 | 1 | 10 0xd049555555555556e69dc1b6b4000581 | a6723136-4048-11eb-a309-3e76e3b340e7 | 0 | null | True | 2 | null | 6 | 2 | 12
  • 52. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How it Works 4. Connector correctly replicates as JSON: Kafka message number 1 (key): { "schema": { "type": "struct", "fields": [ { "type": "int32", "optional": true, "field": "ck" }, { "type": "int32", "optional": true, "field": "pk" } ], "optional": false, "name": "ks.t.Key" }, "payload": { "ck": 5, "pk": 1 } } Kafka message number 1 (value): { "schema": { "type": "struct", "fields": [ { "type": "int32", "optional": true, "field": "ck" }, { "type": "int32", "optional": true, "field": "pk" }, { "type": "struct", "fields": [ { "type": "int32", [*snip* Etc.]
  • 53. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Deltas Only...for now • Currently only provides delta operations • Preimage and postimage will be added in the future • Will match nicely with “before” & “after” fields of Debezium
  • 54. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Confluent Developer developer.confluent.io 54 Learn Kafka!
  • 55. Q&A
  • 56. United States 545 Faber Place Palo Alto, CA 94303 Israel 11 Galgalei Haplada Herzelia, Israel www.scylladb.com @scylladb Thank you