Microsoft has deployed a new rights management system to improve the way the company handles access to the on-call engineers dealing with cloud outages.
Lockbox began development in 2010 to automate rights management for engineering, allowing them temporary access to higher-tier privileges to fix outages more quickly without exposure to customer data.
It was the brainchild of Raj Rajagopalan and a couple of developers, a then greenhorn Office 365 engineer who, like many others starting out, was lobbed with the on-call graveyard shift for the cloud software as a service during his first week in early 2010.
The idea was born sometime after his alarm blared around 3am. A customers’ Office 365 installation had a bug and Rajagopalan needed to reboot the systems.
But he lacked the authorisation for the disruptive fix, so he phoned on-call operations asking for a reboot.
They too lacked the privileges to do so without the approval of the incident manager.
Operations were eventually granted the right to reboot the systems, and services were quickly restored.
But the sluggish incident response process meant the performance benchmark by which incidents are measured — mean time to recovery (MTTR) — had blown out.
Project Lockbox, built by Rajagopalan and core developers Andrey Lukyanov and Shane Brady over weekends in Microsoft’s Garage prototype lab, slashed the measurement within Microsoft’s Office team.
It was showcased in September last year as a prototype at one of the lab’s eight annual science fairs, winning approval for development and staff resources.
Lockbox went live internally across Microsoft’s Office engineering team in January this year.
“MTTR of issues is much faster now because we empower the engineers with Lockbox based recovery actions without elevating their permissions,” Rajagopalan said.
Department staff were stripped of access rights and given base-level access, with temporary elevated privileges afforded on-demand through Lockbox.
Requests for elevated access deemed to be abnormal by the automated systems are flagged by Lockbox and sent to a manager for manual approval.
“It could be said we democratised the permissions model,” Rajagopalan said.
Minimum privileges mean engineers can only have exposure to customer data when they request access through Lockbox, which is logged.
A mobile phone app was also built for speedy Lockbox approvals.
Prototyping the future
Many more projects have born and died in Microsoft’s Garage, but its manager Quinn Hawkins doesn’t shed any tears.
“Ideas are a dime a dozen. It’s the execution that’s hard,” he said.
Engineers use the Garage in and out of work hours and pitch their projects to department heads on science fair days.
Good projects, Hawkins said, came from prototypes and not brain-storming maps or voting polls.
“You get whims, not innovation; for instance you could have a great idea for the Windows kernel, but only a few people get what that is, so it doesn’t get the votes. Idea sites are where ideas go to die.”
About 50 tools are produced in the Garage every month, collectively used by approximately 40,000 Microsoft employees.