Keynote: Incident Response: Trade-Offs Under Pressure
Abstract
The increasing complexity of software applications and architectures in Internet services challenge the reasoning of operators tasked with diagnosing and resolving outages and degradations as they arise. This talk will give a glimpse into how other fields handle incident response and paint a picture of what active steps we can take to support engineers in those uncertain and ambiguous scenarios. Examples include fields such as military, surgical trauma units, space transportation, aviation and air traffic control, and wildland firefighting.
Despite higher focus on how failures can be prevented through more robust and fault-tolerant design of these systems, a dearth of research explores the cognitive challenges engineers face when those preventative designs fail and they are left to think and react to scenarios that hadn’t been imagined, much like in fields outside of technology.