Presentation: Have You Tried Turning It Off and On Again?
This presentation is now available to view on InfoQ.com
Watch videoAbstract
Would you jump on this train of thought for a moment and see if you agree? Let’s say you have some number of computers. It could be three, it could be kerjillions, the number probably doesn’t matter too much for this thought experiment. Now lets say you have a number of people, probably closer to three than kerjillions, but find a number that works for you. And these people are tasked with making those computers function together in an resilient fashion in the real world. Can we agree that how the people operate the computers in production can have a significant impact on the resilience of the system? Almost obvious, no?
But much less obvious are the deeper questions like: what are the characteristics of an operations practice that actively influence a system towards greater resiliency? Which practices (lets call them “operations theatre”) pretend to assist us in this goal but really work against us? In this talk not only will we uncover the answers, but we’ll use concrete examples from the breadth of the Site Reliability Engineering discipline to illustrate just how they work.
Similar Talks
SRE AMA w/ Lorne Kligerman & Laura Nolan
Director of Product @GremlinInc
Lorne Kligerman
Building Resilient Serverless Systems
Cloud Technology Consultant with an expertise in Serverless Computing