Presentation: Using Chaos To Build Resilient Systems
This presentation is now available to view on InfoQ.com
Watch videoWhat You’ll Learn
- Learn how chaos engineering can help an organization build more resilient systems.
- Understand strategies on how to get a chaos engineering program started and what are appropriate first steps.
- Hear first-hand experiments from a senior principal SRE how chaos engineering has affected her systems.
Abstract
There are those of us that are motivated to build resilient systems, improve uptime, move fast and keep systems reliable. Then there are those of us who feel overwhelmed by our to-do lists and the features or projects we feel we need to get out the door.
The world needs more resilient systems because the world needs engineers in this for the long haul. We can create a better future for ourselves, those who come after us, our customers and our wider teams by focusing on building resilient systems. How do we make it easier for everyone to build resilient systems?
It is not easy to build resilient systems, but that doesn’t mean we shouldn’t try. Engineers love a technical challenge. In this talk I will explain how focusing on the detection, mitigation, resolution and prevention of incidents is a great place to start. I will share my experiences using chaos engineering to build resilient systems... even when you can’t build your systems from scratch.
What do you want someone to leave your talk with?
Everyone who comes along to this talk will leave with an understanding of how they can start seeing massive benefits from practicing Chaos Engineering within 3 months. Chaos Engineering to me is the fastest, most efficient way to take a giant leap forward for the resilience of your systems and team.
Can you give me an example of a time Choas Engineering really saved you?
Through practicing Chaos Engineering I have personally achieved a 10x reduction in incidents and the complete elimination of high severity (SEV 0) incidents for 12+ months. This giant leap was achieved within a 3-month window. That means less downtime and less pagerpain for everyone.
What is the level of experience someone attending this talk should have?
To get the most value from this talk you have ideally been on-call and felt the pain of keeping the lights on.
Similar Talks
SRE AMA w/ Lorne Kligerman & Laura Nolan
Director of Product @GremlinInc
Lorne Kligerman
Building Resilient Serverless Systems
Cloud Technology Consultant with an expertise in Serverless Computing