SlideShare a Scribd company logo
1 of 45
Download to read offline
#PulsarSummit Asia 2020#PulsarSummit Asia 2020
Structured Data Stream with Pulsar
Shivji Kumar Jha
1
●
●
●
●
Who am I ?
https://www.linkedin.com/in/shivjijha/
https://twitter.com/ShivjiJha
Catalogue
• Background: Apache Pulsar
• Background: Schema
• Why Schema
• Introducing Pulsar Schema
• Learnings
• Q&A
3
Background: Apache Pulsar
4
Background: Apache Pulsar
Pulsar: cloud-native, distributed messaging and streaming platform
5
Background: Apache Pulsar
Pulsar: cloud-native, distributed messaging and streaming platform
APACHE PULSAR
6
Background: Apache Pulsar
Pulsar: cloud-native, distributed messaging and streaming platform
Highlights:
1. Modular design
2. Horizontally scalable
3. Low latency with durability
4. Multi-tenancy
5. Geo Replication
APACHE PULSAR
7
Background : Schema
8
Background - schema : serialization
Definitions
1. Imagine you have to send an employee record over network.
2. Cant write as is.
3. Employee encoder to convert employee record to a stream of bytes.
4. Formally, encoding / serialization.
5. Send bytes over network.
https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types
9
Background - schema : de-serialization
Definitions
1. When reading from network, turn stream of bytes to employee record.
2. Decoder converts bytes to employee instance.
3. Formally, decoding / de-serialization.
https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types
10
Background - schema : Schema?
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
1. Encoding can be done in native serialization of programmng language. Examples:
a. Java Serialization
b. Python’s pickle
c. Ruby’s marshal
2. Locked with programming language - oops!
3. Maybe JSON or XML work like web APIs?
a. too verbose
b. storing keys over and over
c. no way to fix types, guess types looking at data. Yuck!
4. Need to save space with each data instance.
5. Also, people stuffing random types which other people dont understand.
a. Document well?
6. Ok, lets agree on some protocols and make good documentation of whats allowed and whats
NOT
7. Well, that is what Avro, Protobuff, thrift etc are!
11
Background - schema : Evolution
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
1. The schema is defined and documented. Great!
2. Someone wants to quickly add a new data type.
a. How does decoder know which schema to use: old or new?
b. Among all schemas how does decoder know two are connected?
i. That is schema versioning for you!
3. Avro, Protobuff, json schema, thrift etc support schema evolution
with versioning.
4. Possible to have sender (producer) and reader (consumer) having
different versions of schema at same time.
12
Background - schema : Avro
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
1. Encode data with a schema.
2. Ship schema to consumer(?)
a. or Keep schema in a central place keyed with schemaId.
b. Ship schemaId with binary message.
3. While decoding:
a. Get schemaId from beginning of message (always Long?)
b. Fetch schema by schemaId from central schema store.
4. Decode keeping schema and binary data together.
5. Example: Schema tells decoder to expect 4 bytes to convert to an int
13
Why Schema?
14
Schema : no schema?
APACHE
PULSAR
BYTES BYTES
15
Schema : no schema?
APACHE
PULSAR
BYTES BYTES
1. Your data in pulsar store is plain
binary (0s and 1s).
2. Pulsar supports several schema
types for encoding & decoding.
3. Can encode data using schema.
4. Can decode data given schema &
binary data.
16
Schema : no schema?
APACHE
PULSAR
BYTES BYTES
1. Schema or no schema?
2. How do you encode / decode
bytes of pulsar data?
3. If you don’t have a schema, your,
schema is implicit in your app
code!
17
Schema : no schema?
APACHE
PULSAR
BYTES BYTES
1. Schema or no schema?
2. How do you encode / decode
bytes of pulsar data?
3. If you don’t have a schema, your,
schema is implicit in your app
code!
18
Schema : no schema?
https://martinfowler.com/articles/schemaless/#non-uniform-types
19
Schema : no schema?
Add custom
fields for UI
etc
Different attributes
depending on kind of
event
Obviously, easy for
schemaless,
still needs care!
https://martinfowler.com/articles/schemaless/#non-uniform-types
20
Introducing Pulsar Schema
21
Introducing Pulsar Schema : bytes
Domain Object
22
Introducing Pulsar Schema : bytes
Domain Object Byte schema serialized with java
23
Introducing Pulsar Schema : String
Producer
Consumer
24
Introducing Pulsar Schema : All Primitive types
25
Introducing Pulsar Schema : All Primitive types
26
Introducing Pulsar Schema : Structs
(JSON schema)
Domain Object
27
Introducing Pulsar Schema : Structs
(JSON schema)
Domain Object Producer with JSON schema serialization
28
Introducing Pulsar Schema : Structs
(AVRO schema)
29
Introducing Pulsar Schema : Structs
(AVRO schema)
30
Introducing Pulsar Schema : Structs
(AVRO schema)
Application “knows” which types go to which topic.
31
Pulsar Schema : Schema Store
(Client side)
1. In the previous examples, schema was stored in producer
and consumer object.
2. This is client-side schema storage approach.
32
Pulsar Schema : Schema Store
(Client side)
Problems:
1. Client responsible for:
a. “serializing” data objects (user instance) into bytes
b. “de-serializing” bytes to data object (user instance)
c. “knowing” which types go to which topic.
2. With consumer spread across several micro-services,
“knowing” and “evolving” schema is challenging!
33
Pulsar Schema : Schema Store
(Server side)
Solution:
1. Store schema on a central server.
2. When producing, upload schema to central server.
3. Add schemaId (Long) to message.
4. When consuming, fetch schema with schemaId.
5. Schema management server manages evolution (versioning).
Pulsar has built-in schema registry service!!
34
Pulsar Schema : Schema Registry
1. Entity for schema registry service: schemaInfo .
35
Pulsar Schema : Schema Registry
36
Pulsar Schema : Schema Registry
1. Each schemaInfo stored with a topic
has a version.
2. SchemaVersion manages schema
changes happening within a topic.
3. Messages produced with a
schemaInfo is tagged with version.
4. Consumer can use schemaVersion to
fetch schemaInfo. Decode message
with schemaInfo.
37
Pulsar Schema : Schema Registry
schema Payload structure:
schemaType schemaType;
Boolean isDeleted;
Long timestamp;
String user;
byte[] data;
Hashmap<string, string> props;
38
Pulsar Schema : Schema Registry
Admin CLI commands and REST APIs to manage schemas:
39
Learnings
40
Learnings
1. Struct schemas (json, avro, protobuff) model domain
objects well.
2. Use byte schema only if really needed.
3. Using avro schemas with pulsar for over an year in
production
a. Json schema is too verbose.
b. proto awesome, still being adopted with sources / sinks
c. avro saves data per message against json schema.
d. avro is very well adopted among source / sinks.
41
Learnings
1. Always a good idea to
think hard and set
compatibility on
namespace.
2. Decide on compatibility
depending on use-case
and expected evolution.
42
Learnings
1. Prefer ordering requirements to decide what goes on
which topic.
a. One domain => one topic!
b. Use AUTO_CONSUME on consumer schema type.
c. schema-autoupdate-strategy = NONE.
2. The schema management process that works for us is:
a. Keep a github repo with schemas.
b. use code reviews to review schema changes.
c. Generate POJOs from avro using maven plugin (java).
d. Add the pojo library as a dependency to micro-services to import
domain objects. 43
References
1. Pulsar docs: https://pulsar.apache.org/docs/en/schema-get-started/
2. Schema auto update strategy:
https://pulsar.apache.org/docs/en/pulsar-admin/#set-schema-autoupdate-s
trategy
3. Schema Evolution in Avro, Thrift, Protobuff:
https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-proto
col-buffers-thrift.html
4. Topic design per domain:
https://www.confluent.io/blog/put-several-event-types-kafka-topic/
5. Schema Compatibility Design:
https://docs.confluent.io/platform/current/schema-registry/avro.html#comp
atibility-types 44
Staying Connected:
●
○
○
●
○
●
○ https://twitter.com/ShivjiJha
○ https://www.linkedin.com/in/shivjijha/
Q & A
45

More Related Content

What's hot

Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
Elastic Data Processing with Apache Flink and Apache Pulsar
Elastic Data Processing with Apache Flink and Apache PulsarElastic Data Processing with Apache Flink and Apache Pulsar
Elastic Data Processing with Apache Flink and Apache PulsarStreamNative
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingStreamNative
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021
Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021
Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021StreamNative
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 
Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot ProgrammerKafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot Programmerconfluent
 
Apache Pulsar and Github
Apache Pulsar and GithubApache Pulsar and Github
Apache Pulsar and GithubStreamNative
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionStreamNative
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemStreamNative
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and KafkaIraj Hedayati
 
Getting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison HighamGetting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison HighamStreamNative
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupKarthik Ramasamy
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentHostedbyConfluent
 

What's hot (20)

Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
Elastic Data Processing with Apache Flink and Apache Pulsar
Elastic Data Processing with Apache Flink and Apache PulsarElastic Data Processing with Apache Flink and Apache Pulsar
Elastic Data Processing with Apache Flink and Apache Pulsar
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021
Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021
Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot ProgrammerKafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
 
Apache Pulsar and Github
Apache Pulsar and GithubApache Pulsar and Github
Apache Pulsar and Github
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless Evolution
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data Ecosystem
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and Kafka
 
Getting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison HighamGetting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison Higham
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - Meetup
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 

Similar to Pulsar Summit Asia - Structured Data Stream with Apache Pulsar

Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
 
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Timothy Spann
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...HostedbyConfluent
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...HostedbyConfluent
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APIshareddatamsft
 
EUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analytics
EUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analyticsEUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analytics
EUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analyticsEUBra BIGSEA
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Big Data Spain
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
 
Realizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache BeamRealizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache BeamJ On The Beach
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 

Similar to Pulsar Summit Asia - Structured Data Stream with Apache Pulsar (20)

Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
 
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
EUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analytics
EUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analyticsEUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analytics
EUBra-BIGSEA: Cloud services with QoS guarantees for Big Data analytics
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Realizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache BeamRealizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache Beam
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 

More from Shivji Kumar Jha

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesShivji Kumar Jha
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesShivji Kumar Jha
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxShivji Kumar Jha
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarShivji Kumar Jha
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar clusterShivji Kumar Jha
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationShivji Kumar Jha
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesShivji Kumar Jha
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityShivji Kumar Jha
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterShivji Kumar Jha
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationShivji Kumar Jha
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesShivji Kumar Jha
 

More from Shivji Kumar Jha (16)

Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
 

Recently uploaded

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

Recently uploaded (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

Pulsar Summit Asia - Structured Data Stream with Apache Pulsar

  • 1. #PulsarSummit Asia 2020#PulsarSummit Asia 2020 Structured Data Stream with Pulsar Shivji Kumar Jha 1
  • 2. ● ● ● ● Who am I ? https://www.linkedin.com/in/shivjijha/ https://twitter.com/ShivjiJha
  • 3. Catalogue • Background: Apache Pulsar • Background: Schema • Why Schema • Introducing Pulsar Schema • Learnings • Q&A 3
  • 5. Background: Apache Pulsar Pulsar: cloud-native, distributed messaging and streaming platform 5
  • 6. Background: Apache Pulsar Pulsar: cloud-native, distributed messaging and streaming platform APACHE PULSAR 6
  • 7. Background: Apache Pulsar Pulsar: cloud-native, distributed messaging and streaming platform Highlights: 1. Modular design 2. Horizontally scalable 3. Low latency with durability 4. Multi-tenancy 5. Geo Replication APACHE PULSAR 7
  • 9. Background - schema : serialization Definitions 1. Imagine you have to send an employee record over network. 2. Cant write as is. 3. Employee encoder to convert employee record to a stream of bytes. 4. Formally, encoding / serialization. 5. Send bytes over network. https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types 9
  • 10. Background - schema : de-serialization Definitions 1. When reading from network, turn stream of bytes to employee record. 2. Decoder converts bytes to employee instance. 3. Formally, decoding / de-serialization. https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types 10
  • 11. Background - schema : Schema? https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html 1. Encoding can be done in native serialization of programmng language. Examples: a. Java Serialization b. Python’s pickle c. Ruby’s marshal 2. Locked with programming language - oops! 3. Maybe JSON or XML work like web APIs? a. too verbose b. storing keys over and over c. no way to fix types, guess types looking at data. Yuck! 4. Need to save space with each data instance. 5. Also, people stuffing random types which other people dont understand. a. Document well? 6. Ok, lets agree on some protocols and make good documentation of whats allowed and whats NOT 7. Well, that is what Avro, Protobuff, thrift etc are! 11
  • 12. Background - schema : Evolution https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html 1. The schema is defined and documented. Great! 2. Someone wants to quickly add a new data type. a. How does decoder know which schema to use: old or new? b. Among all schemas how does decoder know two are connected? i. That is schema versioning for you! 3. Avro, Protobuff, json schema, thrift etc support schema evolution with versioning. 4. Possible to have sender (producer) and reader (consumer) having different versions of schema at same time. 12
  • 13. Background - schema : Avro https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html 1. Encode data with a schema. 2. Ship schema to consumer(?) a. or Keep schema in a central place keyed with schemaId. b. Ship schemaId with binary message. 3. While decoding: a. Get schemaId from beginning of message (always Long?) b. Fetch schema by schemaId from central schema store. 4. Decode keeping schema and binary data together. 5. Example: Schema tells decoder to expect 4 bytes to convert to an int 13
  • 15. Schema : no schema? APACHE PULSAR BYTES BYTES 15
  • 16. Schema : no schema? APACHE PULSAR BYTES BYTES 1. Your data in pulsar store is plain binary (0s and 1s). 2. Pulsar supports several schema types for encoding & decoding. 3. Can encode data using schema. 4. Can decode data given schema & binary data. 16
  • 17. Schema : no schema? APACHE PULSAR BYTES BYTES 1. Schema or no schema? 2. How do you encode / decode bytes of pulsar data? 3. If you don’t have a schema, your, schema is implicit in your app code! 17
  • 18. Schema : no schema? APACHE PULSAR BYTES BYTES 1. Schema or no schema? 2. How do you encode / decode bytes of pulsar data? 3. If you don’t have a schema, your, schema is implicit in your app code! 18
  • 19. Schema : no schema? https://martinfowler.com/articles/schemaless/#non-uniform-types 19
  • 20. Schema : no schema? Add custom fields for UI etc Different attributes depending on kind of event Obviously, easy for schemaless, still needs care! https://martinfowler.com/articles/schemaless/#non-uniform-types 20
  • 22. Introducing Pulsar Schema : bytes Domain Object 22
  • 23. Introducing Pulsar Schema : bytes Domain Object Byte schema serialized with java 23
  • 24. Introducing Pulsar Schema : String Producer Consumer 24
  • 25. Introducing Pulsar Schema : All Primitive types 25
  • 26. Introducing Pulsar Schema : All Primitive types 26
  • 27. Introducing Pulsar Schema : Structs (JSON schema) Domain Object 27
  • 28. Introducing Pulsar Schema : Structs (JSON schema) Domain Object Producer with JSON schema serialization 28
  • 29. Introducing Pulsar Schema : Structs (AVRO schema) 29
  • 30. Introducing Pulsar Schema : Structs (AVRO schema) 30
  • 31. Introducing Pulsar Schema : Structs (AVRO schema) Application “knows” which types go to which topic. 31
  • 32. Pulsar Schema : Schema Store (Client side) 1. In the previous examples, schema was stored in producer and consumer object. 2. This is client-side schema storage approach. 32
  • 33. Pulsar Schema : Schema Store (Client side) Problems: 1. Client responsible for: a. “serializing” data objects (user instance) into bytes b. “de-serializing” bytes to data object (user instance) c. “knowing” which types go to which topic. 2. With consumer spread across several micro-services, “knowing” and “evolving” schema is challenging! 33
  • 34. Pulsar Schema : Schema Store (Server side) Solution: 1. Store schema on a central server. 2. When producing, upload schema to central server. 3. Add schemaId (Long) to message. 4. When consuming, fetch schema with schemaId. 5. Schema management server manages evolution (versioning). Pulsar has built-in schema registry service!! 34
  • 35. Pulsar Schema : Schema Registry 1. Entity for schema registry service: schemaInfo . 35
  • 36. Pulsar Schema : Schema Registry 36
  • 37. Pulsar Schema : Schema Registry 1. Each schemaInfo stored with a topic has a version. 2. SchemaVersion manages schema changes happening within a topic. 3. Messages produced with a schemaInfo is tagged with version. 4. Consumer can use schemaVersion to fetch schemaInfo. Decode message with schemaInfo. 37
  • 38. Pulsar Schema : Schema Registry schema Payload structure: schemaType schemaType; Boolean isDeleted; Long timestamp; String user; byte[] data; Hashmap<string, string> props; 38
  • 39. Pulsar Schema : Schema Registry Admin CLI commands and REST APIs to manage schemas: 39
  • 41. Learnings 1. Struct schemas (json, avro, protobuff) model domain objects well. 2. Use byte schema only if really needed. 3. Using avro schemas with pulsar for over an year in production a. Json schema is too verbose. b. proto awesome, still being adopted with sources / sinks c. avro saves data per message against json schema. d. avro is very well adopted among source / sinks. 41
  • 42. Learnings 1. Always a good idea to think hard and set compatibility on namespace. 2. Decide on compatibility depending on use-case and expected evolution. 42
  • 43. Learnings 1. Prefer ordering requirements to decide what goes on which topic. a. One domain => one topic! b. Use AUTO_CONSUME on consumer schema type. c. schema-autoupdate-strategy = NONE. 2. The schema management process that works for us is: a. Keep a github repo with schemas. b. use code reviews to review schema changes. c. Generate POJOs from avro using maven plugin (java). d. Add the pojo library as a dependency to micro-services to import domain objects. 43
  • 44. References 1. Pulsar docs: https://pulsar.apache.org/docs/en/schema-get-started/ 2. Schema auto update strategy: https://pulsar.apache.org/docs/en/pulsar-admin/#set-schema-autoupdate-s trategy 3. Schema Evolution in Avro, Thrift, Protobuff: https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-proto col-buffers-thrift.html 4. Topic design per domain: https://www.confluent.io/blog/put-several-event-types-kafka-topic/ 5. Schema Compatibility Design: https://docs.confluent.io/platform/current/schema-registry/avro.html#comp atibility-types 44