Online accounting darling Xero’s DevOps push has gone so well the company now feels confident it can update is SaaS service twice daily.
Four years ago the company used the Waterfall software development methodology, updated its platform about every three weeks and, according to general manager for platform and reliability Ben Salt, “felt we were not doing badly with 17 change windows a year.”
But Salt said he and is colleagues were ambitious – Xero has famously come from nowhere to make life hard for the likes of MYOB and Reckon in an industry thought to have very sticky incumbencies.
Xero adopted DevOps after deciding to migrate to AWS, a move that Salt said saw it go from running 500 physical servers to 5,000 virtual servers in the cloud. The new servers were smaller and better-suited to the task at hand, but the scale of the expanded fleet quickly proved unmanageable without automation.
While the migration went well, Salt said it hit what AWS calls “The Great Stall”, a phase in adoption of agile methodologies where not much happens because the team isn’t configured for the job at hand.
In Xero’s case that meant a team of 26 just proved too complex.
“The communications barriers in that team were just phenomenal,” Salt said. Xero responded by splitting it into five teams.
In the early days of this regime the company placed a member of its operations team
in each product development team, so that the developers could understand the impact of their work on servers and hopefully not code things that would make a mess.
A pilot of this arrangement in one team spread to three teams and then became standard practice, even as Xero shed its dedicated operations team.
Salt said the company now runs “two-pizza teams” – named because there’s only enough members to eat a pair of pizzas (albeit big, New-York style pies) – and staffs them with “T-shaped people” who have a broad skill set (the top of the T) and a specialty (the stem of the T).
A platform-as-a-service group remains dedicated to keeping Xero’s AWS implementation doing all it needs to give the developers the services they need to keep the products evolving.
The product teams, meanwhile, have evolved their continuous development practices to the point at which daily updates are unremarkable and more frequent updates are on the agenda.
Salt said most daily updates are small bug fixes or changes to elements of the user interface that customers point out could be more elegant. But the company works to a 90-day roadmap for new features and now has a “developer velocity” metric that aims for new code to de deployed a week after the first commit is made.
Salt said Xero is now contemplating how it can adopt containers and serverless computing, which he feels will further improve developer productivity by shrinking the size of the projects on which they work. He also feels they’ll shrink Xero’s AWS bill, by reducing reliance on conventional servers.
“The ability to be charged per execution will be handy,” Salt said. “Cost is always something we can improve. And lower costs means more resources for development.” And more development means more responsiveness, happier customers and more chance of doing things like entering new markets.
Xero’s made these changes with help from New Relic’s real-time application monitoring tools, and also by adopting Google’s Site Reliability Engineering methodology. Salt said the latter has proven conceptually valuable, but needed localization and also some adaptation because it was designed for hyperscale operators. Xero’s not in that class – yet!