Popular cloud-based graphic design software firm Canva has embraced chaos engineering, seeking the first member of a four-person team charged with breaking the digital design darling’s web site, again and again and again.
Chaos engineering is most-often practised by web-scale companies and works by having random breakages unleashed upon their infrastructure to build resilience at scale.
The concept was popularised by Netflix, which in 2011 revealed it had even created a tool named ChaosMonkey, “that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact.”
Now Canva reckons it’s reached a scale at which it also needs to understand what happens if it breaks. Thus, the company has decided it’s time to swing the virtual wrecking ball around inside its data centres.
“As we have grown the amount of product, code and services we run have increased, and that makes reliability a challenge,” Canva back-end engineer Jim Tyrrell told iTnews.
“We are used by fifteen million monthly active users and they expect us to be reliable,” Tyrrell said. “We want to be at the reliability of Google Docs. To do that we need to be able to predict failure.”
Tyrrell said Canva plans to hire two chaos engineers for starters (it’s already advertised) and add another couple over a year or so.
The new hires will first be asked to manually disable parts of Canva’s development and staging infrastructure, to learn about the impact on production systems.
Over time, Tyrrell said he hopes to run tests on the company’s live services, removing small pieces like personalisation engines to understand how Canva can keep its core services running even if some modules aren’t working.
The hoped-for outcome is to understand how to provide acceptable-but-degraded services, without disrupting users.
Canva is aware of tools like Netflix’s ChaosMonkey (which has been open-sourced), but will start with manual disruptions and then build its own chaos-inflicting tooling as it moves to automated auto-vandalism.