SlideShare a Scribd company logo
1 of 30
Download to read offline
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Troubleshooting ClickHouse
Performance
Shiv Iyer
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Intro: About me
● Shiv Iyer
○ Founder and Principal of MinervaDB Inc.
■ MinervaDB Inc. - Consultative Support and Managed Services Provider for MySQL,
MariaDB and PostgreSQL
○ Founder and Principal of ChistaDATA Inc.
■ ChistaDATA Inc. - Consultative Support and Managed Services for ClickHouse
○ Technology Focus
■ Open Source Database Systems: MySQL, MariaDB, PostgreSQL and ClickHouse
■ Full-Stack Performance Troubleshooting and Optimization
■ Capacity Planning and Sizing
○ Follow me on Twitter: @thewebscaledba
○ Email: ceo@minervadb.com / ceo@chistadata.com
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Basic understanding of the art and science of
Systems Operations Performance - Before Solving
● How long it takes for operation/process to complete - Response Time
● Load on the system, thread handling and queueing:
○ Thread Performance
○ Memory Handling
○ Deadlocks
PLEASE DON'T QUANTIFY HIGH CPU USAGE, EXTENSIVE DISK
OPERATIONS AND LOW NETWORK BANDWIDTH METRICS ON
PERFORMANCE AUDIT PROGRAMS
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Components/Pillars of Performance Engineering
● MIPS is how many millions of
instructions executed per second.
But, Higher MIPS is not optimal
performance or execution plan.
The MIPS rating is only
acceptable
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Performance Troubleshooting - How it begins ?
● UNHAPPY BUSINESS
○ Customers spending more time on requests
○ Cost of technology infrastructure is seriously impacting margins and budgets
○ Technology Automation Process Failure
■ Average Response Time of queries increasing significantly so scaling business
operations with more people
■ Delayed Demand FulFillment:
● Unhappy customers/suppliers/partners/employees/investors
● Losses / Layoffs / Shutdown
● More pains - Direct/Indirect impact on economy
PERFORMANCE IS A BUSINESS ACCELERATOR AND NOT JUST A
FEATURE. SYSTEMS WHICH MADE AN IMPACT ARE OPTIMAL AND
RELIABLE
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
UNDERSTANDING CLICKHOUSE AND CHALLENGES
● Open Source Column-oriented Database Management System for Online Analytical
Processing (OLAP) Queries
● Persistent data on ClickHouse is sorted by Primary Key, This make OLAP applications
deployed on ClickHouse optimal
● ClickHouse supports Parallel Processing on Multiple Cores
● ClickHouse supports Distributed OLAP Queries
Challenges with large ClickHouse Infrastructure Operations
● OLAP Database Management Systems grows really big with time and ClickHouse is no
exception there (though ClickHouse provides compelling compression algorithms-
Specialized Codecs) so troubleshooting ClickHouse query performance is a specialized
skill.
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Troubleshooting ClickHouse Performance - Methodology
● Understanding Application Latency - Response Time
○ Time spent for the completion of a process
● Measure the load on ClickHouse infrastructure:
○ Latency of query operations
○ Throughput - Queries Per Minute (QPM)
● Evidence Collection / Performance Forensics Methods:
○ Observability Tools
○ Profiling Techniques
○ Tracing Methods
LATENCY IS A TIME-BASED METRIC IN PERFORMANCE ENGINEERING
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Simple and powerful Linux/UNIX tools for
troubleshooting Systems Operations Performance
Tool Name Description
top Top processes by latency and throughput
procstat Detailed report on individual performance statistics
sar General purpose system performance monitoring tool
vmstat Virtual memory statistics collector and systemwide CPU usage aggregator
iostat Disk I/O performance statistics collector/aggregator
sockstat Network performance statistics collector
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
CPU Performance - Cycles Per Instruction
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Monitoring Top Processes
By default, top(1) displays all the details of ‘top’ processes on each system and periodically updates this
information every 2.0 seconds using the raw cpu use percentage to rank the processes in the list. Technically, The
top command tracks detailed throughput replated information about the cpu and processes
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Monitoring CPU usage with bdsar
System Activity Reporter (SAR) for FreeBSD
systems. Detailed analysis of network, cpu,
memory, swap, and NFS usage.
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Memory Available, Used and Free - sar -r 1 3
To calculate free memory from Average value use the below formula:
● kbmemfree + kbbuffers + kbcached = actual free memory on the system
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
vmstat - Reports Virtual Memory Statistics
The vmstat utility reports
certain kernel statistics
kept about process, virtual
memory, disk, trap and cpu
activity.
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
iostat - Reports I/O Statistics
The iostat utility displays kernel I/O
statistics on terminal, device and cpu
operations.
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Why Tracing the Application Infrastructure important?
● Understand the Execution Plan/Data Access Path of both usual(expected)
and unusual(unexpected) incidents happening on your infrastructure
● Record both successful and unsuccessful events happening on your
infrastructure
● Understanding how system components are consuming available resources
● Threads/Process Handling - Both Latency and Throughput
● Cost efficient Capacity Planning/Sizing
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
eBPF
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
DTrace
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Using DTrace for Troubleshooting System Performance
If you can understand how the hardware infrastructure potential is consumed
efficiently, It will be very easy to set expectations on both latency and throughput,
This is what we call Performance Goal setting. We use DTrace for detailed
analysis of fully-stack infrastructure operations, This helps us in troubleshooting
purely based on evidence:
● CPU usage and distribution - Process handling and thread activity
● RAM/Memory usage
● Disk I/O operations
● Network Infrastructure I/O
● Full-stack Software Infrastructure Operations
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Why DTrace is super cool tracing tool for troubleshooting?
Provider Description
profile Profiling/tracing CPU resource usage patterns. You can use these probes to report
some aspect of system state every unit time and samples are used to infer system
behavior / performance forensics.
sysinfo sysinfo provider include probes that correspond to kernel statistics which are
classified by the name sys.These probes are based on mpstat statistics
plockstat The lockstat provider provides probes that can be used to discern lock contention
statistics, or to understand virtually any aspect of locking behavior.
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
DTrace one-liners for Troubleshooting Latency
### The time spent in read(), in nanoseconds, print as a histogram.
# dtrace -n 'syscall::read:entry { self->ts = timestamp; } syscall::read:return /self->ts/ { @ =
quantize(timestamp - self->ts); self->ts = 0; }'
### Sum kernel adaptive lock block time by process name (ns)
# dtrace -n 'lockstat:::adaptive-block { @[execname] = sum(arg1); }'
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Tracing ClickHouse Performance
● system.trace_log
○ By default records the performance metrics of queries run longer than 1 sec.
○ query_profiler_real_time_period_ns:
■ The clock timer of the query profiler. Real clock timer counts wall-clock time.
■ Recommended values:
● 10000000 (100 times a second) nanoseconds and less for single queries.
● 1000000000 (once a second) for cluster-wide profiling.
○ query_profiler_cpu_time_period_ns:
■ CPU clock timer of the query profiler. This timer counts only CPU time.
■ Recommended values:
● 10000000 (100 times a second) nanoseconds and more for single queries.
● 1000000000 (once a second) for cluster-wide profiling.
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Default configuration of trace_log variable in
config.xml
<trace_log>
<database>system</database>
<table>trace_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</trace_log>
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
system.trace_log - Accommodates stack traces collected
by the sampling query profiler.
SELECT * FROM system.trace_log LIMIT 1 G
Row 1:
──────
event_date: 2022-03-01
event_time: 2022-03-01 06:11:18
event_time_microseconds: 2022-03-01 06:11:18.116138
timestamp_ns: 3712951364193637139
trace_type: ………….
thread_id: ………………………
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
system.trace_log
● event_date (Date) — Date of sampling moment.
● event_time (DateTime) — Timestamp of the sampling moment.
● event_time_microseconds (DateTime64) — Timestamp of the sampling moment with microseconds precision.
● timestamp_ns (UInt64) — Timestamp of the sampling moment in nanoseconds.
● revision (UInt32) — ClickHouse server build revision.
When connecting to the server by clickhouse-client, you see the string similar to Connected to ClickHouse server
version 19.18.1 revision 54429.. This field contains the revision, but not the version of a server.
● trace_type (Enum8) — Trace type:
○ Real represents collecting stack traces by wall-clock time.
○ CPU represents collecting stack traces by CPU time.
○ Memory represents collecting allocations and deallocations when memory allocation exceeds the subsequent
watermark.
○ Memory Sample represents collecting random allocations and deallocations.
● thread_number (UInt32) — Thread identifier.
● query_id (String) — Query identifier that can be used to get details about a query that was running from the
query_log system table.
● trace (Array(UInt64)) — Stack trace at the moment of sampling. Each element is a virtual memory address inside
ClickHouse server process
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
System Tables on ClickHouse to troubleshoot
more intuitively
System Table Description
system.processes Detailed reporting on both active and idle ClickHouse threads/processes
system.query_log Log of all the queries executed - start time, end time, duration, errors
system.query_thread_log Detailed report on threads and queries executed - thread name, thread start
time, duration
system.trace_log Tracing ClickHouse operations to build Data Access Path/Execution Plan
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
How do you consolidate the effort in ClickHouse
Troubleshooting?
● Diagnostic tools you can use for performance forensics/troubleshooting
● Quantify performance against throughput for proactive capacity
planning/sizing
● What is not performance troubleshooting?
● How you can use the historical performance data to plan for future
● Choosing tools to access only the relevant data
● Root cause analysis in Performance Audit
● Building systems for performance
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
WE ARE HIRING
MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Thank you!

More Related Content

Similar to Troubleshooting ClickHouse Performance

Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce HBaseCon
 
Enterprise SaaS Persistence With AWS Databases
Enterprise SaaS Persistence With AWS DatabasesEnterprise SaaS Persistence With AWS Databases
Enterprise SaaS Persistence With AWS DatabasesVishwastam Shukla
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented DesignRodrigo Campos
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
Tuning data warehouse
Tuning data warehouseTuning data warehouse
Tuning data warehouseSrinivasan R
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike, Inc.
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationNagios
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
Training netbackup6x2
Training netbackup6x2Training netbackup6x2
Training netbackup6x2M Shariff
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesScyllaDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Three Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyThree Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyScyllaDB
 

Similar to Troubleshooting ClickHouse Performance (20)

Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Enterprise SaaS Persistence With AWS Databases
Enterprise SaaS Persistence With AWS DatabasesEnterprise SaaS Persistence With AWS Databases
Enterprise SaaS Persistence With AWS Databases
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
Tuning data warehouse
Tuning data warehouseTuning data warehouse
Tuning data warehouse
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Training netbackup6x2
Training netbackup6x2Training netbackup6x2
Training netbackup6x2
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Three Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyThree Perspectives on Measuring Latency
Three Perspectives on Measuring Latency
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsDoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...DoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on KubernetesDoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyDoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudDoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native DatabaseDoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sDoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Troubleshooting ClickHouse Performance

  • 1. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Troubleshooting ClickHouse Performance Shiv Iyer
  • 2. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Intro: About me ● Shiv Iyer ○ Founder and Principal of MinervaDB Inc. ■ MinervaDB Inc. - Consultative Support and Managed Services Provider for MySQL, MariaDB and PostgreSQL ○ Founder and Principal of ChistaDATA Inc. ■ ChistaDATA Inc. - Consultative Support and Managed Services for ClickHouse ○ Technology Focus ■ Open Source Database Systems: MySQL, MariaDB, PostgreSQL and ClickHouse ■ Full-Stack Performance Troubleshooting and Optimization ■ Capacity Planning and Sizing ○ Follow me on Twitter: @thewebscaledba ○ Email: ceo@minervadb.com / ceo@chistadata.com
  • 3. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Basic understanding of the art and science of Systems Operations Performance - Before Solving ● How long it takes for operation/process to complete - Response Time ● Load on the system, thread handling and queueing: ○ Thread Performance ○ Memory Handling ○ Deadlocks PLEASE DON'T QUANTIFY HIGH CPU USAGE, EXTENSIVE DISK OPERATIONS AND LOW NETWORK BANDWIDTH METRICS ON PERFORMANCE AUDIT PROGRAMS
  • 4. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Components/Pillars of Performance Engineering ● MIPS is how many millions of instructions executed per second. But, Higher MIPS is not optimal performance or execution plan. The MIPS rating is only acceptable
  • 5. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Performance Troubleshooting - How it begins ? ● UNHAPPY BUSINESS ○ Customers spending more time on requests ○ Cost of technology infrastructure is seriously impacting margins and budgets ○ Technology Automation Process Failure ■ Average Response Time of queries increasing significantly so scaling business operations with more people ■ Delayed Demand FulFillment: ● Unhappy customers/suppliers/partners/employees/investors ● Losses / Layoffs / Shutdown ● More pains - Direct/Indirect impact on economy PERFORMANCE IS A BUSINESS ACCELERATOR AND NOT JUST A FEATURE. SYSTEMS WHICH MADE AN IMPACT ARE OPTIMAL AND RELIABLE
  • 6. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US UNDERSTANDING CLICKHOUSE AND CHALLENGES ● Open Source Column-oriented Database Management System for Online Analytical Processing (OLAP) Queries ● Persistent data on ClickHouse is sorted by Primary Key, This make OLAP applications deployed on ClickHouse optimal ● ClickHouse supports Parallel Processing on Multiple Cores ● ClickHouse supports Distributed OLAP Queries Challenges with large ClickHouse Infrastructure Operations ● OLAP Database Management Systems grows really big with time and ClickHouse is no exception there (though ClickHouse provides compelling compression algorithms- Specialized Codecs) so troubleshooting ClickHouse query performance is a specialized skill.
  • 7. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Troubleshooting ClickHouse Performance - Methodology ● Understanding Application Latency - Response Time ○ Time spent for the completion of a process ● Measure the load on ClickHouse infrastructure: ○ Latency of query operations ○ Throughput - Queries Per Minute (QPM) ● Evidence Collection / Performance Forensics Methods: ○ Observability Tools ○ Profiling Techniques ○ Tracing Methods LATENCY IS A TIME-BASED METRIC IN PERFORMANCE ENGINEERING
  • 8. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Simple and powerful Linux/UNIX tools for troubleshooting Systems Operations Performance Tool Name Description top Top processes by latency and throughput procstat Detailed report on individual performance statistics sar General purpose system performance monitoring tool vmstat Virtual memory statistics collector and systemwide CPU usage aggregator iostat Disk I/O performance statistics collector/aggregator sockstat Network performance statistics collector
  • 9. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US CPU Performance - Cycles Per Instruction
  • 10. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Monitoring Top Processes By default, top(1) displays all the details of ‘top’ processes on each system and periodically updates this information every 2.0 seconds using the raw cpu use percentage to rank the processes in the list. Technically, The top command tracks detailed throughput replated information about the cpu and processes
  • 11. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Monitoring CPU usage with bdsar System Activity Reporter (SAR) for FreeBSD systems. Detailed analysis of network, cpu, memory, swap, and NFS usage.
  • 12. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Memory Available, Used and Free - sar -r 1 3 To calculate free memory from Average value use the below formula: ● kbmemfree + kbbuffers + kbcached = actual free memory on the system
  • 13. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US vmstat - Reports Virtual Memory Statistics The vmstat utility reports certain kernel statistics kept about process, virtual memory, disk, trap and cpu activity.
  • 14. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US iostat - Reports I/O Statistics The iostat utility displays kernel I/O statistics on terminal, device and cpu operations.
  • 15. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Why Tracing the Application Infrastructure important? ● Understand the Execution Plan/Data Access Path of both usual(expected) and unusual(unexpected) incidents happening on your infrastructure ● Record both successful and unsuccessful events happening on your infrastructure ● Understanding how system components are consuming available resources ● Threads/Process Handling - Both Latency and Throughput ● Cost efficient Capacity Planning/Sizing
  • 16. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US eBPF
  • 17. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
  • 18. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
  • 19. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US DTrace
  • 20. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Using DTrace for Troubleshooting System Performance If you can understand how the hardware infrastructure potential is consumed efficiently, It will be very easy to set expectations on both latency and throughput, This is what we call Performance Goal setting. We use DTrace for detailed analysis of fully-stack infrastructure operations, This helps us in troubleshooting purely based on evidence: ● CPU usage and distribution - Process handling and thread activity ● RAM/Memory usage ● Disk I/O operations ● Network Infrastructure I/O ● Full-stack Software Infrastructure Operations
  • 21. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Why DTrace is super cool tracing tool for troubleshooting? Provider Description profile Profiling/tracing CPU resource usage patterns. You can use these probes to report some aspect of system state every unit time and samples are used to infer system behavior / performance forensics. sysinfo sysinfo provider include probes that correspond to kernel statistics which are classified by the name sys.These probes are based on mpstat statistics plockstat The lockstat provider provides probes that can be used to discern lock contention statistics, or to understand virtually any aspect of locking behavior.
  • 22. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US DTrace one-liners for Troubleshooting Latency ### The time spent in read(), in nanoseconds, print as a histogram. # dtrace -n 'syscall::read:entry { self->ts = timestamp; } syscall::read:return /self->ts/ { @ = quantize(timestamp - self->ts); self->ts = 0; }' ### Sum kernel adaptive lock block time by process name (ns) # dtrace -n 'lockstat:::adaptive-block { @[execname] = sum(arg1); }'
  • 23. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Tracing ClickHouse Performance ● system.trace_log ○ By default records the performance metrics of queries run longer than 1 sec. ○ query_profiler_real_time_period_ns: ■ The clock timer of the query profiler. Real clock timer counts wall-clock time. ■ Recommended values: ● 10000000 (100 times a second) nanoseconds and less for single queries. ● 1000000000 (once a second) for cluster-wide profiling. ○ query_profiler_cpu_time_period_ns: ■ CPU clock timer of the query profiler. This timer counts only CPU time. ■ Recommended values: ● 10000000 (100 times a second) nanoseconds and more for single queries. ● 1000000000 (once a second) for cluster-wide profiling.
  • 24. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Default configuration of trace_log variable in config.xml <trace_log> <database>system</database> <table>trace_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>7500</flush_interval_milliseconds> </trace_log>
  • 25. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US system.trace_log - Accommodates stack traces collected by the sampling query profiler. SELECT * FROM system.trace_log LIMIT 1 G Row 1: ────── event_date: 2022-03-01 event_time: 2022-03-01 06:11:18 event_time_microseconds: 2022-03-01 06:11:18.116138 timestamp_ns: 3712951364193637139 trace_type: …………. thread_id: ………………………
  • 26. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US system.trace_log ● event_date (Date) — Date of sampling moment. ● event_time (DateTime) — Timestamp of the sampling moment. ● event_time_microseconds (DateTime64) — Timestamp of the sampling moment with microseconds precision. ● timestamp_ns (UInt64) — Timestamp of the sampling moment in nanoseconds. ● revision (UInt32) — ClickHouse server build revision. When connecting to the server by clickhouse-client, you see the string similar to Connected to ClickHouse server version 19.18.1 revision 54429.. This field contains the revision, but not the version of a server. ● trace_type (Enum8) — Trace type: ○ Real represents collecting stack traces by wall-clock time. ○ CPU represents collecting stack traces by CPU time. ○ Memory represents collecting allocations and deallocations when memory allocation exceeds the subsequent watermark. ○ Memory Sample represents collecting random allocations and deallocations. ● thread_number (UInt32) — Thread identifier. ● query_id (String) — Query identifier that can be used to get details about a query that was running from the query_log system table. ● trace (Array(UInt64)) — Stack trace at the moment of sampling. Each element is a virtual memory address inside ClickHouse server process
  • 27. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US System Tables on ClickHouse to troubleshoot more intuitively System Table Description system.processes Detailed reporting on both active and idle ClickHouse threads/processes system.query_log Log of all the queries executed - start time, end time, duration, errors system.query_thread_log Detailed report on threads and queries executed - thread name, thread start time, duration system.trace_log Tracing ClickHouse operations to build Data Access Path/Execution Plan
  • 28. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US How do you consolidate the effort in ClickHouse Troubleshooting? ● Diagnostic tools you can use for performance forensics/troubleshooting ● Quantify performance against throughput for proactive capacity planning/sizing ● What is not performance troubleshooting? ● How you can use the historical performance data to plan for future ● Choosing tools to access only the relevant data ● Root cause analysis in Performance Audit ● Building systems for performance
  • 29. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US WE ARE HIRING
  • 30. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US Thank you!