Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services.
"Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal
10. *based on Logz.io research
The Market is Dominated by
Open Source Solutions
Over the past 3 years, the market shifted attention
from proprietary to open source
ELK Stack,
400,000+
companies
Splunk, Sumo Logic,
Loggly, - 20,000
companies
Graphite has > 1M
companies using it
12. Intro to ELK
Logstash
•Streaming data digestion
•Time normalization
•Field extraction
Elasticsearch
•Schema-less search DB
•Highly scalable
Kibana
•Visualization
13. Open source ELK +/-
Simple and
beautifulIt’s simple to get started and play with ELK
and the UI is just beautiful
Open Source
The largest user base with a vibrant open
source community that supports and
improves the product
Fast. Very fast.
Built on the Elasticsearch search engine, ELK
provide blazing quick responses even when
searching through millions of documents
Hard to Scale
Data piles up and organization experience
usage bursts. It’s super-complex building
elastic ELK deployments that can scale up and
down
Poor Security
Logs include sensitive data and open source
ELK offers no real security solution, from
authentication to role based access
Not Production Ready
Building production ready ELK deployment is a great
challenge organization face. With hundreds of different
configurations and support matrix, making sure it’s always
up is difficult
14. Up and running in
minutesSign up in and get insights into your
data in minutes
Logz.io Enterprise ELK Cloud
Service
Production ready
Predefined and community designed
dashboard, visualization and alerts
are all bundled and ready to provide
insights
Infinitely scalable
Ship as much data as you want
whenever you want
Alerts
Unique Alerts system proprietary built on
top of open source ELK transform the ELK
into a proactive system
Highly Available
Data and entire data ingestion
pipeline can sustain downtime in full
datacenter without losing data or
service
Advanced Security
360 degrees security with role based
access and multi-layer security
16. Prototype
• Installing ELK stack on a single server – 1hr
• Shipping one type of log – 1hr
• Log parsing – 2 hr
• Building Kibana Dashboard – 2hr
• 6 hours to get a simple Prototype
18. OS Level
OptimizationElasticsearch require a lot of OS level
optimization in order to run properly.
Elasticsearch
Shard Allocation
Optimizing insert and query times
can be tricky and require a lot of
attention.
Index Management
Because deletion is an expensive
operation Index management is
required for log analytics solutions
Zone awareness
This is specific for AWS and required to
achieve high availability
Cluster Topology
Elasticsearch clusters require 3
Master nodes, Data nodes and Client
nodes.
Bulk inserts
OptimizationOptimizing insert time and latency
19. Capacity
provisioningNeed to account for log bursts and be
able to provision enough capacity.
Elasticsearch (2)
Archive (DR)
Snapshot the data to a different
repository for disaster recovery
Mapping
managementMapping conflicts and sync issues
need to be detected and addressed
Monitoring
Marvell does a good job but require
DevOps constant attention
Curator
Remove or optimize old indices
Alias management
For better cluster control you need to
define and use aliases
20. Data parsing
Extracting values from text messages
and enhancing them with geo user
agent etc.
Logstash
High Availability
Running logstash in a cluster is not
trivial.
Scalability
Dealing with increase of load on the
logstash servers
Burst Protection
Logs tend to be bursty – A special buffer
like Redis, Kafka etc. is required to front
logstash
Rejection from
ElasticsearchElaticsearch rejects about 1% of
messages due to mapping issues –
This needs to be addressed
Configuration
managementA special infrastructure need to be in
place to allow config changes with no
data loss
21. Security
Kibana by default has no protection.
User authentication is required to be
implemented
Kibana
High Availability
Running Kibana in a cluster for
upgrades and high availability.
Role based access
If you want to restrict access to
certain information this capability
needs to be developed
Alerts
Alerts is not part of the open source.
Anomaly Detection
Basic anomaly detection is missing
from the Kibana
Pre Canned
DashboardsBuilding Dashboards and visualization
in Kibana is tricky and require special
knowledge
23. Upgrades
Challenging to upgrade – need to be
aware of backward compatibility.
Maintenance
Overall cluster
healthMonitor the health of the
environment
AWS Issues
Dealing with AWS stability issues
Mapping conflicts
Deal with arising mapping conflicts
Personnel
redundancyNeed to have multiple people with
deep knowledge of the stack
Capacity increase
Provision additional capacity and
grow the cluster.