https://youtu.be/MRuCslYiQE4 (full meetup)
https://go.dok.community/slack
https://dok.community
ABSTRACT OF THE TALK
This talk is about how I use several tools, technologies and processes to troubleshoot ClicHouse Performance. I will be talking about multiple Linux Toolkits, Trace Profilers like DTrace, BPF etc. and also ClickHouse System Tables. This talk also covers best practices/checklist / run-book for building "High Performance ClickHouse Infrastructure Operations"
BIO
Open Source Database Systems Geek in MySQL, MariaDB, PostgreSQL and ClickHouse with core expertise in performance, scalability, high availability and database reliability engineering, Shiv currently is the founder and principal of MinervaDB Inc., an enterprise-class 24*7 Consultative Support and Managed Services Provider for MySQL, MariaDB and PostgreSQL.
Shiv also is the Founder and Principal of ChistaDATA Inc., an independent 24*7 Consultative Support and Managed Service Provider for ClickHouse.
Shiv in the past worked for companies like MySQL AB, SUN Microsystems, AOL, eBay, PayPal, PalominoDB and Percona. Shiv also is a frequent speaker in open source conferences worldwide.
Optimizing AI for immediate response in Smart CCTV
Troubleshooting ClickHouse Performance
1. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Troubleshooting ClickHouse
Performance
Shiv Iyer
2. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Intro: About me
● Shiv Iyer
○ Founder and Principal of MinervaDB Inc.
■ MinervaDB Inc. - Consultative Support and Managed Services Provider for MySQL,
MariaDB and PostgreSQL
○ Founder and Principal of ChistaDATA Inc.
■ ChistaDATA Inc. - Consultative Support and Managed Services for ClickHouse
○ Technology Focus
■ Open Source Database Systems: MySQL, MariaDB, PostgreSQL and ClickHouse
■ Full-Stack Performance Troubleshooting and Optimization
■ Capacity Planning and Sizing
○ Follow me on Twitter: @thewebscaledba
○ Email: ceo@minervadb.com / ceo@chistadata.com
3. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Basic understanding of the art and science of
Systems Operations Performance - Before Solving
● How long it takes for operation/process to complete - Response Time
● Load on the system, thread handling and queueing:
○ Thread Performance
○ Memory Handling
○ Deadlocks
PLEASE DON'T QUANTIFY HIGH CPU USAGE, EXTENSIVE DISK
OPERATIONS AND LOW NETWORK BANDWIDTH METRICS ON
PERFORMANCE AUDIT PROGRAMS
4. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Components/Pillars of Performance Engineering
● MIPS is how many millions of
instructions executed per second.
But, Higher MIPS is not optimal
performance or execution plan.
The MIPS rating is only
acceptable
5. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Performance Troubleshooting - How it begins ?
● UNHAPPY BUSINESS
○ Customers spending more time on requests
○ Cost of technology infrastructure is seriously impacting margins and budgets
○ Technology Automation Process Failure
■ Average Response Time of queries increasing significantly so scaling business
operations with more people
■ Delayed Demand FulFillment:
● Unhappy customers/suppliers/partners/employees/investors
● Losses / Layoffs / Shutdown
● More pains - Direct/Indirect impact on economy
PERFORMANCE IS A BUSINESS ACCELERATOR AND NOT JUST A
FEATURE. SYSTEMS WHICH MADE AN IMPACT ARE OPTIMAL AND
RELIABLE
6. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
UNDERSTANDING CLICKHOUSE AND CHALLENGES
● Open Source Column-oriented Database Management System for Online Analytical
Processing (OLAP) Queries
● Persistent data on ClickHouse is sorted by Primary Key, This make OLAP applications
deployed on ClickHouse optimal
● ClickHouse supports Parallel Processing on Multiple Cores
● ClickHouse supports Distributed OLAP Queries
Challenges with large ClickHouse Infrastructure Operations
● OLAP Database Management Systems grows really big with time and ClickHouse is no
exception there (though ClickHouse provides compelling compression algorithms-
Specialized Codecs) so troubleshooting ClickHouse query performance is a specialized
skill.
7. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Troubleshooting ClickHouse Performance - Methodology
● Understanding Application Latency - Response Time
○ Time spent for the completion of a process
● Measure the load on ClickHouse infrastructure:
○ Latency of query operations
○ Throughput - Queries Per Minute (QPM)
● Evidence Collection / Performance Forensics Methods:
○ Observability Tools
○ Profiling Techniques
○ Tracing Methods
LATENCY IS A TIME-BASED METRIC IN PERFORMANCE ENGINEERING
8. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Simple and powerful Linux/UNIX tools for
troubleshooting Systems Operations Performance
Tool Name Description
top Top processes by latency and throughput
procstat Detailed report on individual performance statistics
sar General purpose system performance monitoring tool
vmstat Virtual memory statistics collector and systemwide CPU usage aggregator
iostat Disk I/O performance statistics collector/aggregator
sockstat Network performance statistics collector
9. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
CPU Performance - Cycles Per Instruction
10. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Monitoring Top Processes
By default, top(1) displays all the details of ‘top’ processes on each system and periodically updates this
information every 2.0 seconds using the raw cpu use percentage to rank the processes in the list. Technically, The
top command tracks detailed throughput replated information about the cpu and processes
11. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Monitoring CPU usage with bdsar
System Activity Reporter (SAR) for FreeBSD
systems. Detailed analysis of network, cpu,
memory, swap, and NFS usage.
12. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Memory Available, Used and Free - sar -r 1 3
To calculate free memory from Average value use the below formula:
● kbmemfree + kbbuffers + kbcached = actual free memory on the system
13. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
vmstat - Reports Virtual Memory Statistics
The vmstat utility reports
certain kernel statistics
kept about process, virtual
memory, disk, trap and cpu
activity.
14. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
iostat - Reports I/O Statistics
The iostat utility displays kernel I/O
statistics on terminal, device and cpu
operations.
15. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Why Tracing the Application Infrastructure important?
● Understand the Execution Plan/Data Access Path of both usual(expected)
and unusual(unexpected) incidents happening on your infrastructure
● Record both successful and unsuccessful events happening on your
infrastructure
● Understanding how system components are consuming available resources
● Threads/Process Handling - Both Latency and Throughput
● Cost efficient Capacity Planning/Sizing
20. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Using DTrace for Troubleshooting System Performance
If you can understand how the hardware infrastructure potential is consumed
efficiently, It will be very easy to set expectations on both latency and throughput,
This is what we call Performance Goal setting. We use DTrace for detailed
analysis of fully-stack infrastructure operations, This helps us in troubleshooting
purely based on evidence:
● CPU usage and distribution - Process handling and thread activity
● RAM/Memory usage
● Disk I/O operations
● Network Infrastructure I/O
● Full-stack Software Infrastructure Operations
21. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Why DTrace is super cool tracing tool for troubleshooting?
Provider Description
profile Profiling/tracing CPU resource usage patterns. You can use these probes to report
some aspect of system state every unit time and samples are used to infer system
behavior / performance forensics.
sysinfo sysinfo provider include probes that correspond to kernel statistics which are
classified by the name sys.These probes are based on mpstat statistics
plockstat The lockstat provider provides probes that can be used to discern lock contention
statistics, or to understand virtually any aspect of locking behavior.
22. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
DTrace one-liners for Troubleshooting Latency
### The time spent in read(), in nanoseconds, print as a histogram.
# dtrace -n 'syscall::read:entry { self->ts = timestamp; } syscall::read:return /self->ts/ { @ =
quantize(timestamp - self->ts); self->ts = 0; }'
### Sum kernel adaptive lock block time by process name (ns)
# dtrace -n 'lockstat:::adaptive-block { @[execname] = sum(arg1); }'
23. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Tracing ClickHouse Performance
● system.trace_log
○ By default records the performance metrics of queries run longer than 1 sec.
○ query_profiler_real_time_period_ns:
■ The clock timer of the query profiler. Real clock timer counts wall-clock time.
■ Recommended values:
● 10000000 (100 times a second) nanoseconds and less for single queries.
● 1000000000 (once a second) for cluster-wide profiling.
○ query_profiler_cpu_time_period_ns:
■ CPU clock timer of the query profiler. This timer counts only CPU time.
■ Recommended values:
● 10000000 (100 times a second) nanoseconds and more for single queries.
● 1000000000 (once a second) for cluster-wide profiling.
24. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
Default configuration of trace_log variable in
config.xml
<trace_log>
<database>system</database>
<table>trace_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</trace_log>
25. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
system.trace_log - Accommodates stack traces collected
by the sampling query profiler.
SELECT * FROM system.trace_log LIMIT 1 G
Row 1:
──────
event_date: 2022-03-01
event_time: 2022-03-01 06:11:18
event_time_microseconds: 2022-03-01 06:11:18.116138
timestamp_ns: 3712951364193637139
trace_type: ………….
thread_id: ………………………
26. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
system.trace_log
● event_date (Date) — Date of sampling moment.
● event_time (DateTime) — Timestamp of the sampling moment.
● event_time_microseconds (DateTime64) — Timestamp of the sampling moment with microseconds precision.
● timestamp_ns (UInt64) — Timestamp of the sampling moment in nanoseconds.
● revision (UInt32) — ClickHouse server build revision.
When connecting to the server by clickhouse-client, you see the string similar to Connected to ClickHouse server
version 19.18.1 revision 54429.. This field contains the revision, but not the version of a server.
● trace_type (Enum8) — Trace type:
○ Real represents collecting stack traces by wall-clock time.
○ CPU represents collecting stack traces by CPU time.
○ Memory represents collecting allocations and deallocations when memory allocation exceeds the subsequent
watermark.
○ Memory Sample represents collecting random allocations and deallocations.
● thread_number (UInt32) — Thread identifier.
● query_id (String) — Query identifier that can be used to get details about a query that was running from the
query_log system table.
● trace (Array(UInt64)) — Stack trace at the moment of sampling. Each element is a virtual memory address inside
ClickHouse server process
27. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
System Tables on ClickHouse to troubleshoot
more intuitively
System Table Description
system.processes Detailed reporting on both active and idle ClickHouse threads/processes
system.query_log Log of all the queries executed - start time, end time, duration, errors
system.query_thread_log Detailed report on threads and queries executed - thread name, thread start
time, duration
system.trace_log Tracing ClickHouse operations to build Data Access Path/Execution Plan
28. MinervaDB Inc., 340 S LEMON AVE #9718 WALNUT 91789 CA, US
How do you consolidate the effort in ClickHouse
Troubleshooting?
● Diagnostic tools you can use for performance forensics/troubleshooting
● Quantify performance against throughput for proactive capacity
planning/sizing
● What is not performance troubleshooting?
● How you can use the historical performance data to plan for future
● Choosing tools to access only the relevant data
● Root cause analysis in Performance Audit
● Building systems for performance