Chaos Engineering CE: Injecting Chaos into DevOps

by Doug McCord
June 18, 2024
Chaos Engineering Tools and Implementation

Our systems aren’t getting any less complex, any time soon.  

With increased distribution across the cloud, AI integrations demanding focus, and rampant cybercrime, it’s a wonder CTOs get any sleep at night. Just AI, that mystery box we’re plugging into everything, relies on networks of other systems to go, and go well. 

Got a good map? Hopefully yes, and even then, with everything clicking and humming like it’s supposed to, and locked down (so far as known), the results are unlikely to be all that predictable. 

Chaos, somewhere on that map, is always lurking.  

Enter chaos engineering (CE, ala chaotic evil, or chaos testing in DevOps), the act of intentionally wreaking havoc on your own systems.  

In production.  

This extreme form of testing is arguably as old as computing itself (“huh, production’s pretty unique; I wonder what will happen if we…”) though the terminology and current practices first rose to prominence in the 2010s, led by streaming giant Netflix.  

With our increasing dependence on spread system interconnectedness and the surge in cybercrime, 2024 is an ideal time to examine the chaos engineering benefits in DevOps, to improve operational readiness, the capability of your failover, backups and restoration, security weakpoints, and crisis management.  

So without further delay, let’s turn the monkey loose (machine gun is optional)! 

A Chaos Engineering Case Study: Netflix’s Chaos Monkey 

Breaking things in production (on purpose) was first elevated to an art form by the streaming giant Netflix, following a series of nightmare experiences of their own.  

The first came in 2008, with their movement from data center to the clou