VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
4. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
5. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
6. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
7. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
8. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
9. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
○ Missing log stream concept
10. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
○ Missing log stream concept
○ Missing ability for querying advanced stats from access logs
11. ELK stack
● Strong points:
○ Fast full-text search via ElasticSearch
○ Widespread usage of Logstash and Fluentbit for logs’ ingestion
○ Advanced log analysis via Kibana
● Weak points:
○ Slow data ingestion
○ High CPU and RAM usage during data ingestion
○ Bad on-disk compression for stored logs
○ Missing log stream concept
○ Missing ability for querying advanced stats from access logs
○ Missing CLI integration (e.g. $ elk “some query” | grep … | sort … | tail)
13. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
14. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
15. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
16. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
17. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
18. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
○ Full-text search queries may be slow
19. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
○ Full-text search queries may be slow
○ Grafana UI for logs isn’t so good compared to Kibana
20. Grafana Loki
● Strong points:
○ Lower CPU and RAM usage during data ingestion
○ Good on-disk compression for stored logs
○ Log stream concept - https://grafana.com/docs/loki/latest/fundamentals/overview/
○ Ability to query advanced stats from access logs
○ CLI integration - https://grafana.com/docs/loki/latest/tools/logcli/
● Weak points:
○ LogQL is non-trivial to use for typical queries - https://grafana.com/docs/loki/latest/logql/
○ Full-text search queries may be slow
○ Grafana UI for logs isn’t so good compared to Kibana
○ Missing ability to set individual labels per each log line (ip, user_id, trace_id, etc.) during data
ingestion - https://grafana.com/docs/loki/latest/fundamentals/labels/ . This frequently leads to
high cardinality issues.
22. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
23. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
24. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
25. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
26. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
27. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
● Supports stream concept from Loki
28. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
● Supports stream concept from Loki
● Provides easy to use yet powerful query language - LogsQL
29. What is VictoriaLogs?
● Open source log management system from VictoriaMetrics
● Easy to setup and operate
● Scales vertically and horizontally
● Optimized for low resource usage (CPU, RAM, disk space)
● Accepts data from Logstash and Fluentbit in Elasticsearch format
● Accepts data from Promtail in Loki format
● Supports stream concept from Loki
● Provides easy to use yet powerful query language - LogsQL
● It is in active development right now
32. LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
33. LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
34. LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
● _time:2023-03 - the same as above
35. LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
● _time:2023-03 - the same as above
● _time:2023-03-20T22 - search for logs at March 20, 2023, at 22 UTC
36. LogsQL examples: search by time
● _time:-5m.. - search for logs for the last 5 minutes
● _time:-1h..-5m - search for logs on the time range [now()-1h … now()-5m]
● _time:2023-03-01..2023-03-31 - search for logs at March 2023
● _time:2023-03 - the same as above
● _time:2023-03-20T22 - search for logs at March 20, 2023, at 22 UTC
● It is recommended specifying _time filter in order to narrow down the search
scope and speed up the query
37. LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
38. LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
39. LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
40. LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
● “failed to open” - search for logs containing “failed to open” phrase
41. LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
● “failed to open” - search for logs containing “failed to open” phrase
● exact(“foo bar”) - search for logs with the exact “foo bar” message
42. LogsQL examples: full-text search
● error - case-insensitive search for log messages with “error” word. Messages with
“Error”, “ERROR”, “eRRoR”, etc. will be also found
● fail* - case-insensitive search for log messages with words starting with “fail”,
such as “fail”, “failure”, “FAILED”, etc.
● “Error” - case-sensitive search for log messages with the “Error” word
● “failed to open” - search for logs containing “failed to open” phrase
● exact(“foo bar”) - search for logs with the exact “foo bar” message
● re(“https?://[^s]+”) - search for logs matching the given regexp, e.g. logs with
http or https urls
43. LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
44. LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
● err* AND fail* - search for logs containing both words, which start from err and
fail. For example, “ERROR: the file /foo/bar failed to open”
45. LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
● err* AND fail* - search for logs containing both words, which start from err and
fail. For example, “ERROR: the file /foo/bar failed to open”
● error AND NOT “/foo/bar” - search for logs containing “error” word, but without
“/foo/bar” string
46. LogsQL examples: combining search queries
● error OR warning - search for log messages with either “error” or “warning”
words
● err* AND fail* - search for logs containing both words, which start from err and
fail. For example, “ERROR: the file /foo/bar failed to open”
● error AND NOT “/foo/bar” - search for logs containing “error” word, but without
“/foo/bar” string
● _time:-1h.. AND (error OR warning) AND NOT debug - search for logs for the
last hour with either “error” or “warning” words, but without “debug” word
48. LogsQL examples: searching arbitrary labels
● By default the search is performed in the log message
● Every log entry can contain additional labels. For example, level, ip, user_id,
trace_id, etc.
49. LogsQL examples: searching arbitrary labels
● By default the search is performed in the log message
● Every log entry can contain additional labels. For example, level, ip, user_id,
trace_id, etc.
● LogsQL allows searching in any label via label_name:query syntax
50. LogsQL examples: searching arbitrary labels
● level:(error or warning) - case-insensitive search for logs with level label
containing “error” or “warning”
51. LogsQL examples: searching arbitrary labels
● level:(error or warning) - case-insensitive search for logs with level label
containing “error” or “warning”
● trace_id:”012345-6789ab-cdef” AND error - search for logs with the given
trace_id label, which contain “error” word
53. What is a log stream?
● A log stream is logs generated by a single instance of some application:
○ Some Linux process (Unix daemon)
○ Docker container
○ Kubernetes container running in a pod
54. What is a log stream?
● A log stream is logs generated by a single instance of some application:
○ Some Linux process (Unix daemon)
○ Docker container
○ Kubernetes container running in a pod
● Logs belonging to a single stream are traditionally written to a single file and
investigated with cat, grep, sort, cut, uniq, tail, etc. commands.
55. What is a log stream?
● A stream in distributed system can be uniquely identified by the instance location
such as its TCP address (aka instance label in Prometheus ecosystem).
56. What is a log stream?
● A stream in distributed system can be uniquely identified by the instance location
such as its TCP address (aka instance label in Prometheus ecosystem).
● Multiple instances of a single application (aka shards or replicas) can be identified
by the application name (aka job label in Prometheus ecosystem).
57. What is a log stream?
● A stream in distributed system can be uniquely identified by the instance location
such as its TCP address (aka instance label in Prometheus ecosystem).
● Multiple instances of a single application (aka shards or replicas) can be identified
by the application name (aka job label in Prometheus ecosystem).
● Additional labels can be attached to log streams, so they could be used during log
analysis. For example, environment, datacenter, zone, namespace, etc.
58. What is a log stream?
● ELK misses the concept of log streams, so it may be non-trivial to perform
stream-based log analysis there.
59. What is a log stream?
● ELK misses the concept of log streams, so it may be non-trivial to perform
stream-based log analysis there.
● Grafana Loki supports the log stream concept from the beginning.
60. What is a log stream?
● ELK misses the concept of log streams, so it may be non-trivial to perform
stream-based log analysis there.
● Grafana Loki supports the log stream concept from the beginning.
● VictoriaLogs provides support for log streams.
61. LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
62. LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
● _stream:{job=”nginx”} - search for logs from nginx streams
63. LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
● _stream:{job=”nginx”} - search for logs from nginx streams
● _stream:{env=~”qa|staging”,zone!=”us-east”} - search for log streams from qa or
staging environments at all the zones except us-east
64. LogsQL examples: querying log streams
● VictoriaLogs allows querying streams via _stream label with Prometheus label
filters
● _stream:{job=”nginx”} - search for logs from nginx streams
● _stream:{env=~”qa|staging”,zone!=”us-east”} - search for log streams from qa or
staging environments at all the zones except us-east
● _time:-1h.. AND _stream:{job=”nginx”} AND level:error - search for logs for the
last hour from nginx streams with the level label containing the “error” word
65. Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
66. Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
● The recommended stream labels are instance and job. These labels simplify
correlation between Prometheus metrics and logs.
67. Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
● The recommended stream labels are instance and job. These labels simplify
correlation between Prometheus metrics and logs.
● Stream labels allow grouping logs by individual streams during querying, which
can simplify log analysis.
68. Stream labels vs log labels
● Stream labels remain static, e.g. they do not change across logs belonging to the
same stream.
● The recommended stream labels are instance and job. These labels simplify
correlation between Prometheus metrics and logs.
● Stream labels allow grouping logs by individual streams during querying, which
can simplify log analysis.
● Stream labels can be used for narrowing down the amounts of logs to search and
optimizing the query speed.
69. Stream labels vs log labels
● Log labels can change inside the same stream. For example, level, trace_id, ip,
user_id, response_duration, etc.
70. Stream labels vs log labels
● Log labels can change inside the same stream. For example, level, trace_id, ip,
user_id, response_duration, etc.
● Log labels are known as log fields from structured logging.
71. Stream labels vs log labels
● Log labels can change inside the same stream. For example, level, trace_id, ip,
user_id, response_duration, etc.
● Log labels are known as log fields from structured logging.
● Searching via log labels simplify narrowing down the search results.
73. LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
74. LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
75. LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
● Examples:
○ To get the top 10 paths with the biggest number of 404 HTTP errors
○ To calculate per-domain p99 response duration and the number of requests
○ To get the number of unique IPs, which requested the given url
76. LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
● Examples:
○ To get the top 10 paths with the biggest number of 404 HTTP errors
○ To calculate per-domain p99 response duration and the number of requests
○ To get the number of unique IPs, which requested the given url
● ELK and Grafana Loki do not provide functionality to efficiently perform these
tasks :(
77. LogsQL: stats over access logs
● It is quite common to collect access logs (e.g. nginx or apache logs)
● It is quite common to analyze access logs with grep, cut, sort, uniq, tail, etc.
commands
● Examples:
○ To get the top 10 paths with the biggest number of 404 HTTP errors
○ To calculate per-domain p99 response duration and the number of requests
○ To get the number of unique IPs, which requested the given url
● ELK and Grafana Loki do not provide functionality to efficiently perform these
tasks :(
● VictoraLogs comes to rescue!
78. LogsQL: stats over access logs: example
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
79. LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”}
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Select logs from nginx at prod for the last hour
80. LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Extract ip, path and http status code from nginx log message
81. LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Filter logs with http status code = 404
82. LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404 |
stats by (path) (
count() as requests,
uniq(ip) as uniq_ips,
)
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Count the number of requests and unique ip addresses per each path
83. LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404 |
stats by (path) (
count() as requests,
uniq(ip) as uniq_ips,
) |
sort by requests desc
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Sort by requests in descending order
84. LogsQL: stats over access logs: example
_time:-1h.. AND _stream:{job=”nginx”,env=”prod”} |
extract ‘<ip> <*> “<*> <path> <*>” <status>’ |
filter status:404 |
stats by (path) (
count() as requests,
uniq(ip) as uniq_ips,
) |
sort by requests desc |
limit 10
Top 10 paths from nginx streams at prod for the last hour with the biggest
number of requests, which led to 404 error.
Additionally, calculate the number of unique ip addresses seen per path.
Leave the first 10 entries
87. VictoriaLogs: CLI integration
VictoriaLogs comes with vlogs-cli tool, which can be combined with the traditional CLI
commands during log investigation.
Example: obtain logs from nginx streams for the last hour and then feed the results to
standard CLI tools - jq, grep and tail - for further processing:
vlogs-cli -q ‘_time:-1h.. AND _stream:{job=”nginx”}’ | jq ._msg | grep 404 | tail
88. VictoriaLogs: recap
● Open source solution for log management
● Easy to setup and operate
● Optimized for low resource usage (CPU, RAM, disk space)
● Easy yet powerful query language - LogsQL
● Scales both vertically and horizontally
● Supports data ingestion from Logstash, Fluentd and Promtail
92. How about enterprise features?
Sure! GDPR, security, auth, rate limiting and anomaly
detection will be available in VictoriaLogs enterprise!
93. How does VictoriaLogs compare to
ClickHouse for logs?
VictoriaLogs uses core optimizations similar to ClickHouse
VictoriaLogs is easier to setup and operate than ClickHouse
97. Will data partitioning be supported?
Yes, VictoriaLogs partitions data by weeks
Partitions are self-contained and can be removed / moved /
archived independently