AdvancedDefensiveDevSecOpsSecurity Architecture

Security Chaos Engineering

Sustaining Resilience in Software and Systems

5 / 5

Kelly Shortridge and Aaron Rinehart on treating security as a property of complex adaptive systems: instead of preventing failure, you continuously simulate it, and design the organization to learn from each result.

Buy on Amazon

As an Amazon Associate we earn from qualifying purchases. The link above is sponsored.

Published
2023
Publisher
O'Reilly Media
Pages
423
Language
English

Read this if

Security architects, SREs, and platform engineers ready to abandon the prevention-first frame. Particularly strong for organizations that already practice chaos engineering for reliability and want to extend the discipline to security; the book is the bridge.

Skip this if

Practitioners working in heavily regulated environments where intentional production faults are not legal, or smaller organizations without the operational maturity to run game days safely. Also a poor first security book: it assumes you know what threat models, blast radius, and feedback loops are.

Key takeaways

  • Security and reliability share the same root engineering problem: how to keep complex systems within tolerable bounds when the failure surface is unbounded.
  • Decision trees and effort-vs-impact analysis are operationalizable artifacts, not just blog material; the book teaches you to actually use them.
  • Continuous experimentation is more honest than tabletop exercises: production tells you what is true, runbooks tell you what someone wished were true.

Notes

Pair with Building Secure and Reliable Systems (Adkins et al.) for the Google take on the same problem and with Site Reliability Engineering for the chaos-engineering ancestry. Shortridge's writing at kellyshortridge.com and her D.I.E. (Distributed, Immutable, Ephemeral) talks are the ongoing companion. Aaron Rinehart's earlier work at UnitedHealth Group is the operational origin story for the discipline.