SlideShare a Scribd company logo
1 of 70
Download to read offline
Building a Real-Time
Data Processing Pipeline
Using Apache Kafka, Kafka Connect,
Elasticsearch, and Kibana
Paul Brebner
Instaclustr—Technology Evangelist
Instaclustr Sponsored Booth Presentation
30 September ApacheCon 2020
©Instaclustr Pty Limited, 2020
Blogs (54): www.instaclustr.com/paul-brebner/
Who Am I? What do I do?
1 year ago (ApacheCon Europe 2019)—Look it’s a light!
©Instaclustr Pty Limited, 2020
Who Is Instaclustr?
©Instaclustr Pty Limited, 2020
A complete ecosystem to support mission
critical applications.
Instaclustr Managed Platform
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
©Instaclustr Pty Limited, 2020
Open Source
Big Data
Technologies
on the
Instaclustr
Managed
Platform
on multiple
cloud providers
This talk…
Focuses on three
recent additions
to our managed
platform:
• Kafka Connect
• Elasticsearch
• Kibana
©Instaclustr Pty Limited, 2020
• Technology Overview
• What’s the Story?
• Data sources
• Provisioning clusters
• Configuring Kafka source and sink connectors
• Elasticsearch mappings
• Kibana Visualizations
• Elasticsearch Ingest Pipeline
• Kibana Maps
• Handling failure
Overview
©Instaclustr Pty Limited, 2020
In general integration can be—complicated…
©Instaclustr Pty Limited, 2020
● Zero-code integration
● High availability
● Elastic scaling independent of Kafka
Source Connectors Sink Connectors
Kafka Connect cluster
Syslog
Kafka Cluster
and many more.. and many more..
Sources Sinks
Or—Easy, with Kafka Connect
What?
Why?
● Distributed solution to integrate Kafka with
other heterogeneous data sources/stores.
● Connectors (source or sink) handle
specifics of particular integrations
o Source Kafka
o Kafka Sink
©Instaclustr Pty Limited, 2020
Elasticsearch—scalable
search of indexed
documents
Kibana—visualization
Open Distro for
Elasticsearch—100% Apache
2.0 licensed
Documents
Indices
Managed Elasticsearch + Kibana
©Instaclustr Pty Limited, 2020
What’s The Story?
©Instaclustr Pty Limited, 2020
What’s The Story?
©Instaclustr Pty Limited, 2020
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
What’s The Story?
©Instaclustr Pty Limited, 2020
Instaclustr consultants built
an integration demo using
public climate change data
via REST connectors running
on docker
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
What’s The Story?
©Instaclustr Pty Limited, 2020
Instaclustr consultants built
an integration demo using
public climate change data
via REST connectors running
on docker
Idea: Use streaming REST
public data sources
AND deploy on Instaclustr
managed platform
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
What’s The Story?
©Instaclustr Pty Limited, 2020
Instaclustr consultants built
an integration demo using
public climate change data
via REST connectors running
on docker
Idea: Use streaming REST
public data sources
AND deploy on Instaclustr
managed platform
Look for public streaming
REST APIs with easy to use
JSON data format, complete
data, interesting domain,
not political or apocalyptic…
Impossible?
Kafka Summit—CDC
built Kafka COVID-19
pipeline in < 30 days
https://oceanservice.noaa.gov/
Success! Tides follow Lunar Day
USA Tidal Data
National Oceanic and Atmospheric Administration
©Instaclustr Pty Limited, 2020
Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/
©Instaclustr Pty Limited, 2020
Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/
What’s here?
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
API description https://api.tidesandcurrents.noaa.gov/api/prod/
©Instaclustr Pty Limited, 2020
REST Example
Specify station ID, data type and datum
(I used water level, mean sea level), latest data point, JSON
Call
https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8724580&
product=water_level&datum=msl&units=metric&time_zone=gmt&application=instaclustr&
format=json
Returns
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597",
"s":"0.005", "f":"1,0,0,0", "q":"p"}]}
©Instaclustr Pty Limited, 2020
REST call
JSON result
Let’s start the pipeline using this
REST API for data sources…
©Instaclustr Pty Limited, 2020
What Else Do We Need?
The Instaclustr Console
Provision Kafka and
Kafka Connect clusters
©Instaclustr Pty Limited, 2020
What Else Do We Need?
Select cloud
provider, region,
instance size and
number, security etc.
©Instaclustr Pty Limited, 2020
What Else Do We Need?
Tell Kafka connect
cluster which Kafka
cluster to use, then
provision
©Instaclustr Pty Limited, 2020
Your IP
Now we have a Kafka and Kafka Connect clusters
©Instaclustr Pty Limited, 2020
Next, find a REST connector, deploy to S3 bucket, tell connect cluster
which bucket, configure connector and run
REST source
connector
Tides Topic
REST call
JSON result
(Automatically created)
©Instaclustr Pty Limited, 2020
BYO connectors instructions
https://www.instaclustr.com/support/documentation/kafka-
connect/accessing-and-using-kafka-connect/updating-custom-
connectors/
curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d '
{
"name": "source_rest_tide_1",
"config": {
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"connector.class": "com.tm.kafka.connect.rest.RestSourceConnector",
"tasks.max": "1",
"rest.source.poll.interval.ms": "600000",
"rest.source.method": "GET",
"rest.source.url":
"https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8454000&product=water_level&datum=
msl&units=metric&time_zone=gmt&application=instaclustr&format=json",
"rest.source.headers": "Content-Type:application/json,Accept:application/json",
"rest.source.topic.selector": "com.tm.kafka.connect.rest.selector.SimpleTopicSelector",
"rest.source.destination.topics": "tides-topic"
}
}'
REST source connector configuration including connector
name, class, URL, topic
©Instaclustr Pty Limited, 2020
Polls every 10 minutes, writes result to Kafka topic, picked 5 sensors
to use, so 5 connector instances running.
Now have tidal data coming into the tides topic, what next?
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
©Instaclustr Pty Limited, 2020
Next - Provision Elasticsearch+Kibana clusters
©Instaclustr Pty Limited, 2020
And configure the included Elasticsearch sink connector
to send data to Elasticsearch
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elastic sink connector Tides Index
©Instaclustr Pty Limited, 2020
curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d '
{
"name" : "elastic-sink-tides",
"config" :
{
"connector.class" : "com.datamountaineer.streamreactor.connect.elastic7.ElasticSinkConnector",
"tasks.max" : 3,
"topics" : "tides",
"connect.elastic.hosts" : ”ip",
"connect.elastic.port" : 9201,
"connect.elastic.kcql" : "INSERT INTO tides-index SELECT * FROM tides-topic",
"connect.elastic.use.http.username" : ”elasticName",
"connect.elastic.use.http.password" : ”elasticPassword"
}
}'
Configure sink connector name, class, index and topic.
The index is created with default mappings if it doesn’t already exist.
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elastic sink connector Tides Index
Great! It’s All Working!? Sort Of!
Tide data arriving in Tides Index!
But, in default index mappings, everything is a String.
To graph them as time series by name need a custom mapping.
©Instaclustr Pty Limited, 2020
{"metadata": {
"id":”String",
"name":”String",
"lat":”String”,
"lon":”String"},
"data":[{
"t":”String",
"v":”String"}]}
curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d'
{
"mappings" : {
"properties" : {
"data" : {
"properties" : {
"t" : { "type" : "date",
"format" : "yyyy-MM-dd HH:mm"
},
"v" : { "type" : "double" },
"f" : { "type" : "text" },
"q" : { "type" : "text" },
"s" : { "type" : "text" }
}
},
"metadata" : {
"properties" : {
"id" : { "type" : "text" },
"lat" : { "type" : "text" },
"long" : { "type" : "text" },
"name" : { "type" : ”keyword" } }}}} }'
Custom mapping “t” is a date, “v” is a double, and “name” is a keyword.
©Instaclustr Pty Limited, 2020
• Every time you
• Change an Elasticsearch index mapping, you have to
• Delete the index
• Index all the data again
• But where does the data come from?
• Two options:
• Using a Kafka sink connector the data is already in the
Kafka topic, so just replay it, or,
• Use Elasticsearch reindex operation
• The hard part is over, now…
Reindexing!
©Instaclustr Pty Limited, 2020
Start Kibana With A Single Click
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
Visualization Steps
1: Index Pattern (to get data from Elasticsearch)
Settings -> Index Patterns -> Create Index Pattern -> Define ->
Configure with “t” as timefilter field
2. Create Visualization (to create a graph type)
Visualizations -> Create Visualization -> New Visualization ->
Line -> Choose Source = pattern from 1
3. Configure Graph Settings (to display data correctly)
Select time range, select aggregation for y-axis = average ->
data.v -> select Buckets -> Split series metadata.name -> X-axis
-> Data Histogram = data.t
©Instaclustr Pty Limited, 2020
Time (x axis) vs. average (over 30m) tide level (relative to
average level) in meters for the 5 sample stations
©Instaclustr Pty Limited, 2020
Showing Lunar Day (24 hours 50 minutes)
Lunar Day (24h 50m)
©Instaclustr Pty Limited, 2020
Tidalrange
Showing Tidal Range (high tide – low tide)
©Instaclustr Pty Limited, 2020
By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon:
Revealing Hidden Tidal Energy, Public Domain
Tide range varies depending on moon, sun, local geography, and weather!
©Instaclustr Pty Limited, 2020
By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon:
Revealing Hidden Tidal Energy, Public Domain
Neah Bay is near here
©Instaclustr Pty Limited, 2020
By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon:
Revealing Hidden Tidal Energy, Public Domain
Australia’s Biggest Tide is here
©Instaclustr Pty Limited, 2020
Tides of over 11 meters are forced through two narrow passes
creating the popular tourist attraction known as the Horizontal
Waterfalls in the Kimberley's Talbot Bay.
Next, a map to show the sensor locations to understand tidal ranges
(Photo by Richard Costin)
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
But, there are no geo-points in the data!
Mapping Steps
1. Add geo-point field to index mapping
2. Create Elasticsearch ingest pipeline to construct new field
3. Add as default ingest pipeline to index
Problem:
Elasticsearch doesn’t
recognize separate lat
lon fields as geo-points
Solution:
Add an Elasticsearch
ingest pipeline to pre-
process documents
before they are
indexed
(Need to reindex again)
©Instaclustr Pty Limited, 2020
curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d'
{
"mappings" : {
"properties" : {
"data" : {
"properties" : {
"t" : { "type" : "date",
"format" : "yyyy-MM-dd HH:mm"
},
"v" : { "type" : "double" },
"f" : { "type" : "text" },
"q" : { "type" : "text" },
"s" : { "type" : "text" }
}
},
"metadata" : {
"properties" : {
"id" : { "type" : "text" },
"lat" : { "type" : "text" },
"long" : { "type" : "text" },
"location" : { "type" : "geo_point" },
"name" : { "type" : ”keyword" } }}}} }'
1. Add a new “location” field with a geo_point data type to the mapping and index
©Instaclustr Pty Limited, 2020
curl -u elasticName:elasticPassword ”elasticURL:9201/ _ingest/pipeline/locationPipe" -X PUT -H 'Content-Type:
application/json' -d'
{
"description" : ”construct geo-point String field",
"processors" : [
{
"set" : {
"field": "metadata.location",
"value": "{{metadata.lat}},{{metadata.lon}}"
}
}
]
}
'
2. Create new ingest pipeline to construct new location geo-point
String from existing lat lon fields
©Instaclustr Pty Limited, 2020
3. Add locationPipe as default pipeline to the index
curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index/_settings?pretty" -X PUT -H 'Content-Type:
application/json' -d'
{
"index" : {
"default_pipeline" : ”locationPipe"
}
}
'
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elastic sink connector Tides Index
Now we have a pipeline transforming the raw data and adding
geo-point location data in Elasticsearch
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081”,
”location”: “24.5508,-81.8081”},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
Mapping Visualization Steps
1. Create Visualization
Visualizations -> Create visualization -> New Coordinate Map
-> Select index patterns -> Visualization with default map
2. Configure Graph Settings (to display data correctly)
Select Metrics -> Aggregation (min) -> Field -> data.v -> Buckets -> Geo
coordinates -> Geohash -> Field -> metadata.location
Reuse existing
index pattern
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
Map showing sensor locations and min values over last week
Add your own custom Web Map Service (WMS) layers
URL https://services.nationalmap.gov/arcgis/services/USGSNAIPPlus/MapServer/WMSServer
Layers 1,2,3,5,6,7,9,10,11,13,14,15,17,18,19,21,22,23,25,26,27,29,30,31,32
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
What can go wrong? REST call can return error message, but doesn’t treat it as an
error so it’s sent to Tides Topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
Elastic sink connector tries to read the error message and fails to FAILED state.
Exceptions viewable in the Kafka connect logs topic.
Exceptions viewable in the Kafka connect logs topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
X
FAILED
X
Connect logs topic
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
Current workaround is to monitor and regularly restart failed connectors.
Exceptions viewable in the Kafka connect logs topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
FAILED?
RUNNING
Restart!
X
REST source
connector
Tides Topic
REST call
JSON result
{"error": {"message":"No
data was found. This
product may not be
offered at this station
at the requested
time."}}
Elastic sink connector Tides Index
Better solution - if connectors support KIP-298 “Error Handling in Connect” (not all do)
then configure to ignore input errors.
Errors sent to ”dead letter” topic.
LocationPipe
ingestor
©Instaclustr Pty Limited, 2020
Ignore
Dead letter topic
• Instaclustr consultants, Kafka and Elasticsearch dev teams ,
graphic design and marketing teams
• Zeke, Mussa, Michael, Hendra, Rob, Harvey, Jill, Gina and
more!
• Try us out! Build the same or your own pipeline with our
free trial at Instaclustr.com
Thanks to…
©Instaclustr Pty Limited, 2020
©Instaclustr Pty Limited, 2020
www.instaclustr.com
info@instaclustr.com
@instaclustr
THANK
YOU!

More Related Content

What's hot

GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processingconfluent
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsNeil Avery
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Supply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache KafkaSupply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache KafkaKai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesKai Wähner
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Kai Wähner
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architectureconfluent
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT Worldconfluent
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafkaconfluent
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafkaconfluent
 

What's hot (20)

GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-events
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Supply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache KafkaSupply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache Kafka
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
 

Similar to Building a real-time data processing pipeline using Apache Kafka, Kafka Connect, Elasticsearch and Kibana

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...confluent
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLGuido Schmutz
 
Cloud Platform for IoT
Cloud Platform for IoTCloud Platform for IoT
Cloud Platform for IoTNaoto Umemori
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowSpinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowPaulBrebner2
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowAll Things Open
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE
 
Guillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource APIGuillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource APINathan Van Gheem
 
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptxConfluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptxAhmed791434
 
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan BocutiuMQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiulandoop
 
Web of things introduction
Web of things introductionWeb of things introduction
Web of things introduction承翰 蔡
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Essential Capabilities of an IoT Cloud Platform - AWS Online Tech Talks
Essential Capabilities of an IoT Cloud Platform - AWS Online Tech TalksEssential Capabilities of an IoT Cloud Platform - AWS Online Tech Talks
Essential Capabilities of an IoT Cloud Platform - AWS Online Tech TalksAmazon Web Services
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
 

Similar to Building a real-time data processing pipeline using Apache Kafka, Kafka Connect, Elasticsearch and Kibana (20)

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
 
Cloud Platform for IoT
Cloud Platform for IoTCloud Platform for IoT
Cloud Platform for IoT
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowSpinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
 
Guillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource APIGuillotina: The Asyncio REST Resource API
Guillotina: The Asyncio REST Resource API
 
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptxConfluent-Ably-AWS-ID-2023 - GSlide.pptx
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
 
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan BocutiuMQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
 
Web of things introduction
Web of things introductionWeb of things introduction
Web of things introduction
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Essential Capabilities of an IoT Cloud Platform - AWS Online Tech Talks
Essential Capabilities of an IoT Cloud Platform - AWS Online Tech TalksEssential Capabilities of an IoT Cloud Platform - AWS Online Tech Talks
Essential Capabilities of an IoT Cloud Platform - AWS Online Tech Talks
 
Episode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data ServicesEpisode 3: Kubernetes and Big Data Services
Episode 3: Kubernetes and Big Data Services
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 

More from Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersPaul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Paul Brebner
 

More from Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Building a real-time data processing pipeline using Apache Kafka, Kafka Connect, Elasticsearch and Kibana

  • 1. Building a Real-Time Data Processing Pipeline Using Apache Kafka, Kafka Connect, Elasticsearch, and Kibana Paul Brebner Instaclustr—Technology Evangelist Instaclustr Sponsored Booth Presentation 30 September ApacheCon 2020 ©Instaclustr Pty Limited, 2020
  • 2. Blogs (54): www.instaclustr.com/paul-brebner/ Who Am I? What do I do? 1 year ago (ApacheCon Europe 2019)—Look it’s a light! ©Instaclustr Pty Limited, 2020
  • 4. A complete ecosystem to support mission critical applications. Instaclustr Managed Platform ©Instaclustr Pty Limited, 2020
  • 10. Open Source Big Data Technologies on the Instaclustr Managed Platform ©Instaclustr Pty Limited, 2020
  • 11. ©Instaclustr Pty Limited, 2020 Open Source Big Data Technologies on the Instaclustr Managed Platform
  • 12. ©Instaclustr Pty Limited, 2020 Open Source Big Data Technologies on the Instaclustr Managed Platform
  • 13. ©Instaclustr Pty Limited, 2020 Open Source Big Data Technologies on the Instaclustr Managed Platform on multiple cloud providers
  • 14. This talk… Focuses on three recent additions to our managed platform: • Kafka Connect • Elasticsearch • Kibana ©Instaclustr Pty Limited, 2020
  • 15. • Technology Overview • What’s the Story? • Data sources • Provisioning clusters • Configuring Kafka source and sink connectors • Elasticsearch mappings • Kibana Visualizations • Elasticsearch Ingest Pipeline • Kibana Maps • Handling failure Overview ©Instaclustr Pty Limited, 2020
  • 16. In general integration can be—complicated… ©Instaclustr Pty Limited, 2020
  • 17. ● Zero-code integration ● High availability ● Elastic scaling independent of Kafka Source Connectors Sink Connectors Kafka Connect cluster Syslog Kafka Cluster and many more.. and many more.. Sources Sinks Or—Easy, with Kafka Connect What? Why? ● Distributed solution to integrate Kafka with other heterogeneous data sources/stores. ● Connectors (source or sink) handle specifics of particular integrations o Source Kafka o Kafka Sink ©Instaclustr Pty Limited, 2020
  • 18. Elasticsearch—scalable search of indexed documents Kibana—visualization Open Distro for Elasticsearch—100% Apache 2.0 licensed Documents Indices Managed Elasticsearch + Kibana ©Instaclustr Pty Limited, 2020
  • 20. What’s The Story? ©Instaclustr Pty Limited, 2020 Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 21. What’s The Story? ©Instaclustr Pty Limited, 2020 Instaclustr consultants built an integration demo using public climate change data via REST connectors running on docker Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 22. What’s The Story? ©Instaclustr Pty Limited, 2020 Instaclustr consultants built an integration demo using public climate change data via REST connectors running on docker Idea: Use streaming REST public data sources AND deploy on Instaclustr managed platform Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 23. What’s The Story? ©Instaclustr Pty Limited, 2020 Instaclustr consultants built an integration demo using public climate change data via REST connectors running on docker Idea: Use streaming REST public data sources AND deploy on Instaclustr managed platform Look for public streaming REST APIs with easy to use JSON data format, complete data, interesting domain, not political or apocalyptic… Impossible? Kafka Summit—CDC built Kafka COVID-19 pipeline in < 30 days
  • 24. https://oceanservice.noaa.gov/ Success! Tides follow Lunar Day USA Tidal Data National Oceanic and Atmospheric Administration ©Instaclustr Pty Limited, 2020
  • 25. Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/ ©Instaclustr Pty Limited, 2020
  • 26. Bonus, NOAA tidal map https://tidesandcurrents.noaa.gov/map/ What’s here? ©Instaclustr Pty Limited, 2020
  • 30. REST Example Specify station ID, data type and datum (I used water level, mean sea level), latest data point, JSON Call https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8724580& product=water_level&datum=msl&units=metric&time_zone=gmt&application=instaclustr& format=json Returns {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597", "s":"0.005", "f":"1,0,0,0", "q":"p"}]} ©Instaclustr Pty Limited, 2020
  • 31. REST call JSON result Let’s start the pipeline using this REST API for data sources… ©Instaclustr Pty Limited, 2020
  • 32. What Else Do We Need? The Instaclustr Console Provision Kafka and Kafka Connect clusters ©Instaclustr Pty Limited, 2020
  • 33. What Else Do We Need? Select cloud provider, region, instance size and number, security etc. ©Instaclustr Pty Limited, 2020
  • 34. What Else Do We Need? Tell Kafka connect cluster which Kafka cluster to use, then provision ©Instaclustr Pty Limited, 2020 Your IP
  • 35. Now we have a Kafka and Kafka Connect clusters ©Instaclustr Pty Limited, 2020
  • 36. Next, find a REST connector, deploy to S3 bucket, tell connect cluster which bucket, configure connector and run REST source connector Tides Topic REST call JSON result (Automatically created) ©Instaclustr Pty Limited, 2020 BYO connectors instructions https://www.instaclustr.com/support/documentation/kafka- connect/accessing-and-using-kafka-connect/updating-custom- connectors/
  • 37. curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d ' { "name": "source_rest_tide_1", "config": { "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.storage.StringConverter", "connector.class": "com.tm.kafka.connect.rest.RestSourceConnector", "tasks.max": "1", "rest.source.poll.interval.ms": "600000", "rest.source.method": "GET", "rest.source.url": "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter?date=latest&station=8454000&product=water_level&datum= msl&units=metric&time_zone=gmt&application=instaclustr&format=json", "rest.source.headers": "Content-Type:application/json,Accept:application/json", "rest.source.topic.selector": "com.tm.kafka.connect.rest.selector.SimpleTopicSelector", "rest.source.destination.topics": "tides-topic" } }' REST source connector configuration including connector name, class, URL, topic ©Instaclustr Pty Limited, 2020
  • 38. Polls every 10 minutes, writes result to Kafka topic, picked 5 sensors to use, so 5 connector instances running. Now have tidal data coming into the tides topic, what next? REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited, 2020
  • 39. Next - Provision Elasticsearch+Kibana clusters ©Instaclustr Pty Limited, 2020
  • 40. And configure the included Elasticsearch sink connector to send data to Elasticsearch REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elastic sink connector Tides Index ©Instaclustr Pty Limited, 2020
  • 41. curl https://connectorClusterIP:8083/connectors -k -u name:password -X POST -H 'Content-Type: application/json' -d ' { "name" : "elastic-sink-tides", "config" : { "connector.class" : "com.datamountaineer.streamreactor.connect.elastic7.ElasticSinkConnector", "tasks.max" : 3, "topics" : "tides", "connect.elastic.hosts" : ”ip", "connect.elastic.port" : 9201, "connect.elastic.kcql" : "INSERT INTO tides-index SELECT * FROM tides-topic", "connect.elastic.use.http.username" : ”elasticName", "connect.elastic.use.http.password" : ”elasticPassword" } }' Configure sink connector name, class, index and topic. The index is created with default mappings if it doesn’t already exist. ©Instaclustr Pty Limited, 2020
  • 42. REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elastic sink connector Tides Index Great! It’s All Working!? Sort Of! Tide data arriving in Tides Index! But, in default index mappings, everything is a String. To graph them as time series by name need a custom mapping. ©Instaclustr Pty Limited, 2020 {"metadata": { "id":”String", "name":”String", "lat":”String”, "lon":”String"}, "data":[{ "t":”String", "v":”String"}]}
  • 43. curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d' { "mappings" : { "properties" : { "data" : { "properties" : { "t" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm" }, "v" : { "type" : "double" }, "f" : { "type" : "text" }, "q" : { "type" : "text" }, "s" : { "type" : "text" } } }, "metadata" : { "properties" : { "id" : { "type" : "text" }, "lat" : { "type" : "text" }, "long" : { "type" : "text" }, "name" : { "type" : ”keyword" } }}}} }' Custom mapping “t” is a date, “v” is a double, and “name” is a keyword. ©Instaclustr Pty Limited, 2020
  • 44. • Every time you • Change an Elasticsearch index mapping, you have to • Delete the index • Index all the data again • But where does the data come from? • Two options: • Using a Kafka sink connector the data is already in the Kafka topic, so just replay it, or, • Use Elasticsearch reindex operation • The hard part is over, now… Reindexing! ©Instaclustr Pty Limited, 2020
  • 45. Start Kibana With A Single Click ©Instaclustr Pty Limited, 2020
  • 47. Visualization Steps 1: Index Pattern (to get data from Elasticsearch) Settings -> Index Patterns -> Create Index Pattern -> Define -> Configure with “t” as timefilter field 2. Create Visualization (to create a graph type) Visualizations -> Create Visualization -> New Visualization -> Line -> Choose Source = pattern from 1 3. Configure Graph Settings (to display data correctly) Select time range, select aggregation for y-axis = average -> data.v -> select Buckets -> Split series metadata.name -> X-axis -> Data Histogram = data.t ©Instaclustr Pty Limited, 2020
  • 48. Time (x axis) vs. average (over 30m) tide level (relative to average level) in meters for the 5 sample stations ©Instaclustr Pty Limited, 2020
  • 49. Showing Lunar Day (24 hours 50 minutes) Lunar Day (24h 50m) ©Instaclustr Pty Limited, 2020
  • 50. Tidalrange Showing Tidal Range (high tide – low tide) ©Instaclustr Pty Limited, 2020
  • 51. By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon: Revealing Hidden Tidal Energy, Public Domain Tide range varies depending on moon, sun, local geography, and weather! ©Instaclustr Pty Limited, 2020
  • 52. By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon: Revealing Hidden Tidal Energy, Public Domain Neah Bay is near here ©Instaclustr Pty Limited, 2020
  • 53. By R. Ray, NASA Goddard Space Flight Center, Jet Propulsion Laboratory, Scientific Visualization Studio - TOPEX/Poseidon: Revealing Hidden Tidal Energy, Public Domain Australia’s Biggest Tide is here ©Instaclustr Pty Limited, 2020
  • 54. Tides of over 11 meters are forced through two narrow passes creating the popular tourist attraction known as the Horizontal Waterfalls in the Kimberley's Talbot Bay. Next, a map to show the sensor locations to understand tidal ranges (Photo by Richard Costin) ©Instaclustr Pty Limited, 2020
  • 55. ©Instaclustr Pty Limited, 2020 But, there are no geo-points in the data!
  • 56. Mapping Steps 1. Add geo-point field to index mapping 2. Create Elasticsearch ingest pipeline to construct new field 3. Add as default ingest pipeline to index Problem: Elasticsearch doesn’t recognize separate lat lon fields as geo-points Solution: Add an Elasticsearch ingest pipeline to pre- process documents before they are indexed (Need to reindex again) ©Instaclustr Pty Limited, 2020
  • 57. curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index" -X PUT -H 'Content-Type: application/json' -d' { "mappings" : { "properties" : { "data" : { "properties" : { "t" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm" }, "v" : { "type" : "double" }, "f" : { "type" : "text" }, "q" : { "type" : "text" }, "s" : { "type" : "text" } } }, "metadata" : { "properties" : { "id" : { "type" : "text" }, "lat" : { "type" : "text" }, "long" : { "type" : "text" }, "location" : { "type" : "geo_point" }, "name" : { "type" : ”keyword" } }}}} }' 1. Add a new “location” field with a geo_point data type to the mapping and index ©Instaclustr Pty Limited, 2020
  • 58. curl -u elasticName:elasticPassword ”elasticURL:9201/ _ingest/pipeline/locationPipe" -X PUT -H 'Content-Type: application/json' -d' { "description" : ”construct geo-point String field", "processors" : [ { "set" : { "field": "metadata.location", "value": "{{metadata.lat}},{{metadata.lon}}" } } ] } ' 2. Create new ingest pipeline to construct new location geo-point String from existing lat lon fields ©Instaclustr Pty Limited, 2020
  • 59. 3. Add locationPipe as default pipeline to the index curl -u elasticName:elasticPassword ”elasticURL:9201/tides-index/_settings?pretty" -X PUT -H 'Content-Type: application/json' -d' { "index" : { "default_pipeline" : ”locationPipe" } } ' ©Instaclustr Pty Limited, 2020
  • 60. REST source connector Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elastic sink connector Tides Index Now we have a pipeline transforming the raw data and adding geo-point location data in Elasticsearch {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081”, ”location”: “24.5508,-81.8081”}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} LocationPipe ingestor ©Instaclustr Pty Limited, 2020
  • 61. Mapping Visualization Steps 1. Create Visualization Visualizations -> Create visualization -> New Coordinate Map -> Select index patterns -> Visualization with default map 2. Configure Graph Settings (to display data correctly) Select Metrics -> Aggregation (min) -> Field -> data.v -> Buckets -> Geo coordinates -> Geohash -> Field -> metadata.location Reuse existing index pattern ©Instaclustr Pty Limited, 2020
  • 62. ©Instaclustr Pty Limited, 2020 Map showing sensor locations and min values over last week
  • 63. Add your own custom Web Map Service (WMS) layers URL https://services.nationalmap.gov/arcgis/services/USGSNAIPPlus/MapServer/WMSServer Layers 1,2,3,5,6,7,9,10,11,13,14,15,17,18,19,21,22,23,25,26,27,29,30,31,32 ©Instaclustr Pty Limited, 2020
  • 64. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index What can go wrong? REST call can return error message, but doesn’t treat it as an error so it’s sent to Tides Topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020
  • 65. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index Elastic sink connector tries to read the error message and fails to FAILED state. Exceptions viewable in the Kafka connect logs topic. Exceptions viewable in the Kafka connect logs topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020 X FAILED X Connect logs topic
  • 66. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index Current workaround is to monitor and regularly restart failed connectors. Exceptions viewable in the Kafka connect logs topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020 FAILED? RUNNING Restart! X
  • 67. REST source connector Tides Topic REST call JSON result {"error": {"message":"No data was found. This product may not be offered at this station at the requested time."}} Elastic sink connector Tides Index Better solution - if connectors support KIP-298 “Error Handling in Connect” (not all do) then configure to ignore input errors. Errors sent to ”dead letter” topic. LocationPipe ingestor ©Instaclustr Pty Limited, 2020 Ignore Dead letter topic
  • 68. • Instaclustr consultants, Kafka and Elasticsearch dev teams , graphic design and marketing teams • Zeke, Mussa, Michael, Hendra, Rob, Harvey, Jill, Gina and more! • Try us out! Build the same or your own pipeline with our free trial at Instaclustr.com Thanks to… ©Instaclustr Pty Limited, 2020