- Litmus is a Kubernetes native chaos engineering tool that allows injecting faults into cloud-native applications running on Kubernetes to test their resilience.
- Chaos engineering extends failure testing beyond CI pipelines to pre-production and production environments by intentionally breaking systems to identify weaknesses.
- Litmus provides chaos libraries, an operator, and charts to make it easy to run chaos experiments on Kubernetes applications. Experiments are shared in a central ChaosHub repository.
- The presenters are seeking help from the SIG in areas like coaching developers to contribute new chaos charts and generating more awareness of Litmus.
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
CNCF App-Delivery SIG Presentation - Litmus Chaos Engineering
1. Introduction to Litmus
Kubernetes Native Chaos Engineering
@uma_mukkara &
@ksatchit
MayaData Inc.
22nd October 2019
Presenting to
CNCF Sig-App-Delivery
2. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Sig-App-Delivery & Chaos Engineering
Developers
(Development)
CI Admins &
Developers
(CI pipelines)
SREs
(Staging and
Production)
Chaos Engineering
SIG-App-Delivery
3. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Reliability
● Reliability is too important. Outages of services costs $$$
4. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Finding weaknesses is key
● Failure testing in CI pipelines is not good enough
Failure testing breaks a system in some
preconceived way, but doesn’t explore the wide
open field of weird, unpredictable things that could
happen - Ali Basiri, Chaos Engineering Expert
● Break things on purpose - In production
○ Find weaknesses
○ Fix them
○ Repeat the process
5. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Failure testing vs Chaos testing
● Failure testing in CI pipelines is not good enough
● Break things on purpose - In production
○ Find weaknesses
○ Fix them
○ Repeat the process
Failure testing stops at CI pipelines
Chaos testing extends to Pre-Prod and
Production environments
6. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Chaos Engineering Loop
* Images and content authored by: Mark McBride, Turbine Labs
7. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Chaos Engineering
● Practice chaos engineering to increase resiliency
Resiliency Achieved by
CI Pipelines
Functional
Tests
Failure Tests
+
Achieved by
Staging / Production
Good CI
Random
Chaos+
8. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Cloud-Native environment
● My code is 1%. Rest is not controlled by me.
● Linux is the least dynamic stack
● Rest is all microservices, based - highly dynamic
Then, how to achieve Resilience ?
9. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Cloud-Native Chaos Engineering
Cloud Native
APIs
POD Deployment
PVC Statefulset
SVC CRDs
For
Development
For Chaos Testing
Cloud Native
APIs
?
Cloud-native
Application
10. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Cloud Native
APIs
POD Deployment
PVC Statefulset
SVC CRDs
For Chaos Testing
Cloud Native
APIs
Chaos
Engine
Chaos
Experiment
Chaos Result
New CRDs
Cloud-native
Application
For
Development
Cloud-Native Chaos Engineering
17. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Chaos charts life cycle
Development and CI pipelines
Convert failure tests into
Chaos Experiments
ChaosHub
Use Chaos Experiments in
Staging
Use Chaos Experiments in
Production
18. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Community and releases
● GitHub Stars - 406 ( https://github.com/litmuschaos/litmus/stargazers )
● Contributors - 63 (https://github.com/litmuschaos/litmus/graphs/contributors )
● Slack - #litmus channel on kubernetes slack community (60 members)
○ https://kubernetes.slack.com/messages/CNXNB0ZTN
● Release cadence - 15th of every month
● Current release 0.7
● Community meetup -
○ Twice in a month https://docs.google.com/spreadsheets/d/15svGB99bDcSTkwAYttH1QzP5WJSb-dFKbPzl-
9WqmXM/edit#gid=1935377096
19. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Contributing to Chaos Charts
● A simple tool (generate_chart.py) is developed for developer onboarding
● Convert the business logic in the failure test case into ansible and bootstrap it into a
litmus chaos experiment
● https://docs.litmuschaos.io/docs/next/devguide/
● https://github.com/litmuschaos/litmus/tree/master/contribute/developer_guide
20. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Who is using Litmus now?
https://openebs.ci/
21. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Who is using Litmus now?
22. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Some of the upcoming charts
https://github.com/litmuschaos/litmus/issues/822
https://github.com/litmuschaos/litmus/issues/859
23. Presenting to CNCF Sig-App-Delivery litmuschaos.i
Areas of help/feedback/coaching from SIG
● How to approach cloud-native developers and SREs to contribute new chaos
charts? Coaching needed
● Litmus can be used in cncf.ci (??), we will start working with CI SIG (cncf-ci)
● How to generate more awareness?
○ A blog on blog.cncf.io is scheduled for 6th November
https://kubernetes.slack.com/message
s/CNXNB0ZTN
○ Join #litmus slack channel