SlideShare a Scribd company logo
1 of 57
Download to read offline
Dip Into P8s
hello!
I am Zaar Hai
Staff Cloud Architect at DoiT International
linkedin.com/in/zaar
2
Why this talk?
3
Cloud Out of the box
✘ VM CPU/Network/Disk stats…
✘ But not memory
➢ requires vendor-specific agents
4
Cloud Out of the box
✘ VM CPU/Network/Disk stats…
✘ But not memory
➢ requires vendor-specific agents
✘ Gets much better with GKE
➢ Memory usage for pods IF your YAMLs behave
5
Cloud Out of the box
✘ VM CPU/Network/Disk stats…
✘ But not memory
➢ requires vendor-specific agents
✘ Gets much better with GKE
➢ Memory usage for pods IF your YAMLs behave
✘ All of the above are metrics… But about our app?
6
App Metrics
It’s all about app metrics
7
Why metics at all?
8
When logs are not enough
✘ Too detailed to see the big picture
✘ Hard to see KPIs / trends
9
Logs to metrics
✘ Log-based metrics in StackDriver
➢ Fragile configuration
➢ Disjoint from the app
➢ Vendor specific
✘ Smart log parsers, e.g. Coralogix
Parsed 100 docs in 0.31 seconds
Parsed 100 docs in 0.38 seconds
Parsed 100 docs in 0.34 seconds
✘ An option when retrofitting monitoring on the existing app
10
So what are metrics?
11
Metrics are
12
Just a tuple
Numeric
value
Time
stamp
Metric
name
Metric examples
Process A:
2020-07-28T02:32:06Z http_requests_total 1239
2020-07-28T02:32:07Z http_requests_total 1245
Process B:
2020-07-28T02:32:06Z http_requests_total 1185
2020-07-28T02:32:07Z http_requests_total 1185
Now we can aggregate across!
13
Metric dimensions
Process A:
2020-07-28T02:32:06Z http_requests_total 1239
Process B:
2020-07-28T02:32:06Z http_requests_total 1185
14
Metric dimensions
Process A:
2020-07-28T02:32:06Z http_requests_total{code=200} 1227
2020-07-28T02:32:06Z http_requests_total{code=404} 12
Process B:
2020-07-28T02:32:06Z http_requests_total{code=200} 1177
2020-07-28T02:32:06Z http_requests_total{code=404} 8
15
Metric dimensions
2020-07-28T02:32:06Z
http_requests_total{code=200, process=A} 1227
http_requests_total{code=404, process=A} 12
http_requests_total{code=200, process=B} 1177
http_requests_total{code=404, process=B} 8
16
Metric dimensions
2020-07-28T02:32:06Z
http_requests_total{code=200, process=A, path=/foo} 1107
http_requests_total{code=404, process=A, path=/foo} 12
http_requests_total{code=404, process=A, path=/bar} 120
http_requests_total{code=200, process=B, path=/foo} 1005
http_requests_total{code=404, process=B, path=/foo} 8
http_requests_total{code=200, process=B, path=/bar} 172
17
Now we can graph that!
18
How many metrics?
19
1,000Metrics Per MicroService
3,504,000,000Samples per month
80,000Samples per minute
20
x20
1,000Metrics Per MicroService
3,504,000,000Samples per month
80,000Samples per minute
21
x20
Just for your
app
Wait, there is more!
✘ 10-20k metrics per average K8s node
✘ That’s 1,200,000/minute for 15 node cluster
➢ Assuming 15s collection interval
✘ Or 52,560,000,000 samples a month!
✘ And that’s just for average sized app
22
23
What are
My Options?
“No matter what you chose,
stay within
24
A Single Pane of Glass
Option 1
Stick with the Vendor
25
Enhance the existing
✘ Both AWS and GCP give you so much for free already
✘ Just add your app metrics
➢ And they support that!
✘ No need to ship system metrics, e.g. K8s - they are alredy there
26
But it’s costly
✘ GCP StackDriver
➢ $84/month per 1k metrics
➢ Price drops after 300k metrics
27
But it’s costly
✘ GCP StackDriver
➢ $84/month per 1k metrics
➢ Price drops after 300k metrics
✘ AWS CloudWatch
➢ $300/month per 1k metrics
➢ $100/month after first 10k, $50 after the first 240k
28
But it’s costly
✘ GCP StackDriver
➢ $84/month per 1k metrics
➢ Price drops after 300k metrics
✘ AWS CloudWatch
➢ $300/month per 1k metrics
➢ $100/month after first 10k, $50 after the first 240k
✘ Logs are expensive too, btw
➢ GCP SD: $0.50/GB
➢ AWS CW: $0.60/GB + charge of $0.0057 per scanned GB for queries
29
But it’s costly
✘ 20 μSvc app with 1k metrics and 1GB logs/day per μSvc:
➢ 1*20*30 = 600GB/month
➢ 20k metrics/month
✘ Will cost you:
➢ GCP: $300 for logs + $1760 for metrics
➢ AWS: $360 for logs + $4000 for metrics
✘ It’s only half a story!
➢ With containers metrics are short lived
30
Further considerations
✘ Vendor specific APIs to ship
➢ Challenging for multi-cloud
➢ Gets better with K8s
✘ Limited to 1 minute resolution
31
Option 2
Dedicated SaaS
32
There are many out there
✘ DataDog, Sysdig, NewRelic, Splunk, SumoLogic, Grafana (hosted)
✘ Once you see the pricing, GCP/AWS $-figures make sense :)
✘ Lot’s of added features though:
➢ AI-assisted anomaly detection, etc.
✘ Multi-cloud!
33
There are many out there
✘ DataDog, Sysdig, NewRelic, Splunk, SumoLogic, Grafana (hosted)
✘ Once you see the pricing, GCP/AWS $-figures make sense :)
✘ Lot’s of added features though:
➢ AI-assisted anomaly detection, etc.
✘ Multi-cloud!
✘ But now you need to ship all your system metrics
➢ Can become expensive quickly
34
Option 3
Host your own
35
Simpler than it may sound
Instrument
Collect
&
Store
Display
&
Alert
36
Grafana to Display (and Alert)
De-facto dashboarding software for DevOps and beyond
37
Grafana Multiple Data sources
38
GCP
Pub/Sub
AWS
SES
Email
Parser
Grafana Multiple Data sources
39
GCP
Pub/Sub
AWS
SES
Email
Parser
One Grafana Dashboard
StackDriver
CloudW
atch Prometheus
Hybrid SaaS
As a hybrid SaaS, or “Option 2.5” you can:
✘ Setup hosted Graphana on Graphana Labs
✘ Connect it to CloudWatch, StackDriver, etc.
✘ Ship your app-only metrics to Graphana Labs
➢ At $16/month per 1k metrics
✘ Still limited for 1 minute resolution for CloudWatch/Stackdriver
40
Let’s Kollekt
Instrument
Collect
&
Store
Display
&
Alert
41
Where to?
✘ We have 3 billion app / 50 billion system metric samples per month
✘ Storage size per sample matters here
42
Where to?
✘ We have 3 billion app / 50 billion system metric samples per month
✘ Storage size per sample matters here
✘ MySQL
➢ ~50 bytes per sample (including indexing, etc)
➢ 2.3TB for 50b samples
✘ ElasticSearch
➢ ~20 bytes per sample
➢ 930GB for 50b samples
43
General purpose DBs are expensive
✘ MySQL
➢ $230 for 1 month retention
➢ $1380 for 3 month retention
✘ ElasticSearch
➢ $93 for 1 month retention
➢ $550 for 3 month retention
✘ That’s just for storage! For one app!
44
But metrics data is unique
✘ Immutable (no updates)
✘ Write once
✘ Lots of metrics do not change often
✘ And this is why Time Series Databases were born!
45
Prometheus
Finally!
46
Prometheus at glance
✘ Not a first TSDB, but became a golden standard
✘ 1-2 bytes per sample
➢ $30-$60 storage cost for 3 month retention as in the previous example
✘ Can process 1 million samples per minute on your laptop
47
Not just TSDB
✘ Prometheus discovers:
➢ Your GCE VM
➢ Your GKE pods
✘ Prometheus pulls metrics from targets
✘ Prometheus stores metrics and allows you to query them OR
✘ Federates them further to a central storage
48
Collection at glance
49
GKE Cluster
POD
POD POD
P8S
Collection at glance
50
GKE Cluster
POD
POD POD
P8S
VM
VM
VM
P8S
Collection at glance
51
GKE Cluster
POD
POD POD
P8S
VM
VM
VM
P8S
P8S
Thanos
VictoriaMetrics
etc.
Grafana
Instrumentation
Instrument
Collect
&
Store
Display
&
Alert
52
Python example
53
import time
from flask import Flask
from prometheus_client import start_http_server, Summary
app = Flask(__name__)
REQUEST_TIME = Summary("request_processing_seconds",
"Time spent processing request")
@app.route("/")
@REQUEST_TIME.time()
def hello_world():
return "Hello, World!n"
if __name__ == "__main__":
start_http_server(8081)
app.run(port=8080)
Dedicated port!
Python example - in action!
54
$ python app.py &
$ curl localhost:8080
$ curl localhost:8080
$ curl localhost:8081
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 247.0
python_gc_objects_collected_total{generation="1"} 151.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 60.0
python_gc_collections_total{generation="1"} 5.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="8",patchlevel="3",version="3.8.3"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.34852352e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.6411008e+07
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.29000000000000004
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 7.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024.0
# HELP request_processing_seconds Time spent processing request
# TYPE request_processing_seconds summary
request_processing_seconds_count 2.0
request_processing_seconds_sum 1.3547949492931366e-05
# HELP request_processing_seconds_created Time spent processing request
# TYPE request_processing_seconds_created gauge
request_processing_seconds_created 1.5959190974287152e+09
Recap
55
GKE Cluster
POD
POD POD
P8S
VM
VM
VM
P8S
P8S
Thanos
VictoriaMetrics
etc.
Grafana
More to come in
Part II
56
thanks!
Any questions?
57

More Related Content

What's hot

Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...Ruslan Meshenberg
 
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事smalltown
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringRick Hwang
 
Keystone - Leverage Big Data 2016
Keystone - Leverage Big Data 2016Keystone - Leverage Big Data 2016
Keystone - Leverage Big Data 2016Peter Bakas
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services
 
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...Nicolas Brousse
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackChiradeep Vittal
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Datadog
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017Codemotion
 

What's hot (20)

Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
 
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
openstack, devops and people
openstack, devops and peopleopenstack, devops and people
openstack, devops and people
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and Monitoring
 
Keystone - Leverage Big Data 2016
Keystone - Leverage Big Data 2016Keystone - Leverage Big Data 2016
Keystone - Leverage Big Data 2016
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
 
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
Way to cloud
Way to cloudWay to cloud
Way to cloud
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStack
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
 
Autoscaling with Kubernetes
Autoscaling with KubernetesAutoscaling with Kubernetes
Autoscaling with Kubernetes
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
 

Similar to Dip into prometheus

Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
When Less is More - Save Brain Cycles with GKE Autopilot and Cloud Run
When Less is More - Save Brain Cycles with GKE Autopilot and Cloud RunWhen Less is More - Save Brain Cycles with GKE Autopilot and Cloud Run
When Less is More - Save Brain Cycles with GKE Autopilot and Cloud RunZaar Hai
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Miguel Zuniga
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Puppet
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Containers explained as for cook and a mecanics
 Containers explained as for cook and a mecanics  Containers explained as for cook and a mecanics
Containers explained as for cook and a mecanics Rachid Zarouali
 
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudDigital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudVelocidex Enterprises
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Nick Galbreath
 
A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018Carmen Mason
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerAWSCOMSUM
 
Scale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
Scale search powered apps with Elastisearch, k8s and go - Maxime BoisvertScale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
Scale search powered apps with Elastisearch, k8s and go - Maxime BoisvertWeb à Québec
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage makerPhilipBasford
 
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakijavier ramirez
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Szabolcs Zajdó
 
Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"IT Event
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 

Similar to Dip into prometheus (20)

Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
When Less is More - Save Brain Cycles with GKE Autopilot and Cloud Run
When Less is More - Save Brain Cycles with GKE Autopilot and Cloud RunWhen Less is More - Save Brain Cycles with GKE Autopilot and Cloud Run
When Less is More - Save Brain Cycles with GKE Autopilot and Cloud Run
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and moreWhere should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and more
 
Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014Managing and Scaling Puppet - PuppetConf 2014
Managing and Scaling Puppet - PuppetConf 2014
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Containers explained as for cook and a mecanics
 Containers explained as for cook and a mecanics  Containers explained as for cook and a mecanics
Containers explained as for cook and a mecanics
 
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The CloudDigital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The Cloud
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
 
A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018A Year in Google - Percona Live Europe 2018
A Year in Google - Percona Live Europe 2018
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
Scale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
Scale search powered apps with Elastisearch, k8s and go - Maxime BoisvertScale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
Scale search powered apps with Elastisearch, k8s and go - Maxime Boisvert
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"Mete Atamel "Resilient microservices with kubernetes"
Mete Atamel "Resilient microservices with kubernetes"
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 

More from Zaar Hai

Google auth dispelling the magic
Google auth   dispelling the magicGoogle auth   dispelling the magic
Google auth dispelling the magicZaar Hai
 
Google auth - dispelling the magic
Google auth - dispelling the magicGoogle auth - dispelling the magic
Google auth - dispelling the magicZaar Hai
 
Deep into Prometheus
Deep into PrometheusDeep into Prometheus
Deep into PrometheusZaar Hai
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Zaar Hai
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2Zaar Hai
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1Zaar Hai
 

More from Zaar Hai (6)

Google auth dispelling the magic
Google auth   dispelling the magicGoogle auth   dispelling the magic
Google auth dispelling the magic
 
Google auth - dispelling the magic
Google auth - dispelling the magicGoogle auth - dispelling the magic
Google auth - dispelling the magic
 
Deep into Prometheus
Deep into PrometheusDeep into Prometheus
Deep into Prometheus
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1
 

Recently uploaded

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Dip into prometheus

  • 2. hello! I am Zaar Hai Staff Cloud Architect at DoiT International linkedin.com/in/zaar 2
  • 4. Cloud Out of the box ✘ VM CPU/Network/Disk stats… ✘ But not memory ➢ requires vendor-specific agents 4
  • 5. Cloud Out of the box ✘ VM CPU/Network/Disk stats… ✘ But not memory ➢ requires vendor-specific agents ✘ Gets much better with GKE ➢ Memory usage for pods IF your YAMLs behave 5
  • 6. Cloud Out of the box ✘ VM CPU/Network/Disk stats… ✘ But not memory ➢ requires vendor-specific agents ✘ Gets much better with GKE ➢ Memory usage for pods IF your YAMLs behave ✘ All of the above are metrics… But about our app? 6
  • 7. App Metrics It’s all about app metrics 7
  • 8. Why metics at all? 8
  • 9. When logs are not enough ✘ Too detailed to see the big picture ✘ Hard to see KPIs / trends 9
  • 10. Logs to metrics ✘ Log-based metrics in StackDriver ➢ Fragile configuration ➢ Disjoint from the app ➢ Vendor specific ✘ Smart log parsers, e.g. Coralogix Parsed 100 docs in 0.31 seconds Parsed 100 docs in 0.38 seconds Parsed 100 docs in 0.34 seconds ✘ An option when retrofitting monitoring on the existing app 10
  • 11. So what are metrics? 11
  • 12. Metrics are 12 Just a tuple Numeric value Time stamp Metric name
  • 13. Metric examples Process A: 2020-07-28T02:32:06Z http_requests_total 1239 2020-07-28T02:32:07Z http_requests_total 1245 Process B: 2020-07-28T02:32:06Z http_requests_total 1185 2020-07-28T02:32:07Z http_requests_total 1185 Now we can aggregate across! 13
  • 14. Metric dimensions Process A: 2020-07-28T02:32:06Z http_requests_total 1239 Process B: 2020-07-28T02:32:06Z http_requests_total 1185 14
  • 15. Metric dimensions Process A: 2020-07-28T02:32:06Z http_requests_total{code=200} 1227 2020-07-28T02:32:06Z http_requests_total{code=404} 12 Process B: 2020-07-28T02:32:06Z http_requests_total{code=200} 1177 2020-07-28T02:32:06Z http_requests_total{code=404} 8 15
  • 16. Metric dimensions 2020-07-28T02:32:06Z http_requests_total{code=200, process=A} 1227 http_requests_total{code=404, process=A} 12 http_requests_total{code=200, process=B} 1177 http_requests_total{code=404, process=B} 8 16
  • 17. Metric dimensions 2020-07-28T02:32:06Z http_requests_total{code=200, process=A, path=/foo} 1107 http_requests_total{code=404, process=A, path=/foo} 12 http_requests_total{code=404, process=A, path=/bar} 120 http_requests_total{code=200, process=B, path=/foo} 1005 http_requests_total{code=404, process=B, path=/foo} 8 http_requests_total{code=200, process=B, path=/bar} 172 17
  • 18. Now we can graph that! 18
  • 20. 1,000Metrics Per MicroService 3,504,000,000Samples per month 80,000Samples per minute 20 x20
  • 21. 1,000Metrics Per MicroService 3,504,000,000Samples per month 80,000Samples per minute 21 x20 Just for your app
  • 22. Wait, there is more! ✘ 10-20k metrics per average K8s node ✘ That’s 1,200,000/minute for 15 node cluster ➢ Assuming 15s collection interval ✘ Or 52,560,000,000 samples a month! ✘ And that’s just for average sized app 22
  • 24. “No matter what you chose, stay within 24 A Single Pane of Glass
  • 25. Option 1 Stick with the Vendor 25
  • 26. Enhance the existing ✘ Both AWS and GCP give you so much for free already ✘ Just add your app metrics ➢ And they support that! ✘ No need to ship system metrics, e.g. K8s - they are alredy there 26
  • 27. But it’s costly ✘ GCP StackDriver ➢ $84/month per 1k metrics ➢ Price drops after 300k metrics 27
  • 28. But it’s costly ✘ GCP StackDriver ➢ $84/month per 1k metrics ➢ Price drops after 300k metrics ✘ AWS CloudWatch ➢ $300/month per 1k metrics ➢ $100/month after first 10k, $50 after the first 240k 28
  • 29. But it’s costly ✘ GCP StackDriver ➢ $84/month per 1k metrics ➢ Price drops after 300k metrics ✘ AWS CloudWatch ➢ $300/month per 1k metrics ➢ $100/month after first 10k, $50 after the first 240k ✘ Logs are expensive too, btw ➢ GCP SD: $0.50/GB ➢ AWS CW: $0.60/GB + charge of $0.0057 per scanned GB for queries 29
  • 30. But it’s costly ✘ 20 μSvc app with 1k metrics and 1GB logs/day per μSvc: ➢ 1*20*30 = 600GB/month ➢ 20k metrics/month ✘ Will cost you: ➢ GCP: $300 for logs + $1760 for metrics ➢ AWS: $360 for logs + $4000 for metrics ✘ It’s only half a story! ➢ With containers metrics are short lived 30
  • 31. Further considerations ✘ Vendor specific APIs to ship ➢ Challenging for multi-cloud ➢ Gets better with K8s ✘ Limited to 1 minute resolution 31
  • 33. There are many out there ✘ DataDog, Sysdig, NewRelic, Splunk, SumoLogic, Grafana (hosted) ✘ Once you see the pricing, GCP/AWS $-figures make sense :) ✘ Lot’s of added features though: ➢ AI-assisted anomaly detection, etc. ✘ Multi-cloud! 33
  • 34. There are many out there ✘ DataDog, Sysdig, NewRelic, Splunk, SumoLogic, Grafana (hosted) ✘ Once you see the pricing, GCP/AWS $-figures make sense :) ✘ Lot’s of added features though: ➢ AI-assisted anomaly detection, etc. ✘ Multi-cloud! ✘ But now you need to ship all your system metrics ➢ Can become expensive quickly 34
  • 36. Simpler than it may sound Instrument Collect & Store Display & Alert 36
  • 37. Grafana to Display (and Alert) De-facto dashboarding software for DevOps and beyond 37
  • 38. Grafana Multiple Data sources 38 GCP Pub/Sub AWS SES Email Parser
  • 39. Grafana Multiple Data sources 39 GCP Pub/Sub AWS SES Email Parser One Grafana Dashboard StackDriver CloudW atch Prometheus
  • 40. Hybrid SaaS As a hybrid SaaS, or “Option 2.5” you can: ✘ Setup hosted Graphana on Graphana Labs ✘ Connect it to CloudWatch, StackDriver, etc. ✘ Ship your app-only metrics to Graphana Labs ➢ At $16/month per 1k metrics ✘ Still limited for 1 minute resolution for CloudWatch/Stackdriver 40
  • 42. Where to? ✘ We have 3 billion app / 50 billion system metric samples per month ✘ Storage size per sample matters here 42
  • 43. Where to? ✘ We have 3 billion app / 50 billion system metric samples per month ✘ Storage size per sample matters here ✘ MySQL ➢ ~50 bytes per sample (including indexing, etc) ➢ 2.3TB for 50b samples ✘ ElasticSearch ➢ ~20 bytes per sample ➢ 930GB for 50b samples 43
  • 44. General purpose DBs are expensive ✘ MySQL ➢ $230 for 1 month retention ➢ $1380 for 3 month retention ✘ ElasticSearch ➢ $93 for 1 month retention ➢ $550 for 3 month retention ✘ That’s just for storage! For one app! 44
  • 45. But metrics data is unique ✘ Immutable (no updates) ✘ Write once ✘ Lots of metrics do not change often ✘ And this is why Time Series Databases were born! 45
  • 47. Prometheus at glance ✘ Not a first TSDB, but became a golden standard ✘ 1-2 bytes per sample ➢ $30-$60 storage cost for 3 month retention as in the previous example ✘ Can process 1 million samples per minute on your laptop 47
  • 48. Not just TSDB ✘ Prometheus discovers: ➢ Your GCE VM ➢ Your GKE pods ✘ Prometheus pulls metrics from targets ✘ Prometheus stores metrics and allows you to query them OR ✘ Federates them further to a central storage 48
  • 49. Collection at glance 49 GKE Cluster POD POD POD P8S
  • 50. Collection at glance 50 GKE Cluster POD POD POD P8S VM VM VM P8S
  • 51. Collection at glance 51 GKE Cluster POD POD POD P8S VM VM VM P8S P8S Thanos VictoriaMetrics etc. Grafana
  • 53. Python example 53 import time from flask import Flask from prometheus_client import start_http_server, Summary app = Flask(__name__) REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing request") @app.route("/") @REQUEST_TIME.time() def hello_world(): return "Hello, World!n" if __name__ == "__main__": start_http_server(8081) app.run(port=8080) Dedicated port!
  • 54. Python example - in action! 54 $ python app.py & $ curl localhost:8080 $ curl localhost:8080 $ curl localhost:8081 # HELP python_gc_objects_collected_total Objects collected during gc # TYPE python_gc_objects_collected_total counter python_gc_objects_collected_total{generation="0"} 247.0 python_gc_objects_collected_total{generation="1"} 151.0 python_gc_objects_collected_total{generation="2"} 0.0 # HELP python_gc_objects_uncollectable_total Uncollectable object found during GC # TYPE python_gc_objects_uncollectable_total counter python_gc_objects_uncollectable_total{generation="0"} 0.0 python_gc_objects_uncollectable_total{generation="1"} 0.0 python_gc_objects_uncollectable_total{generation="2"} 0.0 # HELP python_gc_collections_total Number of times this generation was collected # TYPE python_gc_collections_total counter python_gc_collections_total{generation="0"} 60.0 python_gc_collections_total{generation="1"} 5.0 python_gc_collections_total{generation="2"} 0.0 # HELP python_info Python platform information # TYPE python_info gauge python_info{implementation="CPython",major="3",minor="8",patchlevel="3",version="3.8.3"} 1.0 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 2.34852352e+08 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 2.6411008e+07 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 0.29000000000000004 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 7.0 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1024.0 # HELP request_processing_seconds Time spent processing request # TYPE request_processing_seconds summary request_processing_seconds_count 2.0 request_processing_seconds_sum 1.3547949492931366e-05 # HELP request_processing_seconds_created Time spent processing request # TYPE request_processing_seconds_created gauge request_processing_seconds_created 1.5959190974287152e+09
  • 56. More to come in Part II 56