Flight Centre is rearchitecting its online booking engine, and the teams and processes used to maintain it, to improve its speed to deploy new features and competitiveness.
The ASX-listed travel agency has made a series of changes to its online booking engine called Soar and other systems, and to the structures around them, as it chases a goal to “get our deployments soaring”, according to e-commerce technology manager Kiel Frost.
Frost told the recent New Relic FutureStack 19 conference that having powered customers’ searches for domestic and international flights for the past “12 to 15 years”, Soar is expanding to also cover hotels and packages.
Soar is also used by Universal Traveller Australia - better known by its former name Student Flights - and is now being used by Flight Centre’s international operations in New Zealand and Canada as well.
With its use and importance growing, the company saw a need to move faster in getting new features and changes to Soar to market.
“What problems did we have? Just the usual ones ... monolithic applications, databases and build times,” Frost said.
“Our deployment cycle was anywhere between six to eight weeks. That's quite generous, [because] it could actually go up to three-to-six months back in the day.”
Frost’s team has invested time and energy in unpicking which parts of the process - at least the bits under his control - were taking the most time, and finding ways to work faster without sacrificing code quality.
“Why did we want to do this? There’s multiple reasons, but one of them is just to remain competitive,” he said.
“We've got a lot of competitors out there who are continuously adding new features on-the-fly, so we wanted to be able to do that as well.
“There’s also ‘Brightness of future’ - we call it BOF at Flight Centre - where we want to make sure that our people have the right skills and are consistently learning, otherwise they'll become stale and they'll hate working here.”
Frost also found external evidence - in the form of DevOps book ‘Accelerate: Building and Scaling High Performing Technology Organisations’ - to support the internal drive to move more quickly.
“We read about the four different types of metrics to measure high-performing teams, and one of them is around frequent deployments, so we decided to have a look at that,” he said.
“There was a lot of assumptions that came out of that that we needed to validate.
“One of them was around reducing the change fail rate, and we took that definition as every time you do a deployment, if you have to do a hotfix after it, that's a change fail because you've broken something.
“The other assumption is [that moving faster] will reduce our mean time to recovery (MTTR) because right now it was up to eight hours on average for us to actually fix a problem because of the way the architecture works. We wanted to remove some wasted cost.”
A lesser motivator - but one in the back of Frost’s mind - was Flight Centre’s own internal culture.
“We've got this funny thing at Flight Centre, where we have … tribes, and if you're not doing well, there can be an internal takeover or someone else will just spin up a team and do it better because they can.
“So that was another little impact thing that we had to worry about.”
Flight Centre re-established the purposes and principles of its e-commerce team, and then set new guardrails - in the form of architecture principles - for them to develop within.
“One of the guardrails for moving forward is around loosely-coupled systems, so make sure any type of service that you have is independently deployable and independently testable,” Frost said.
“[Another is to be] component-driven, [which] is really just a fancy way of saying just share things where you can - don't reinvent the wheel. If you've got one component doing one thing, work with UX to see how you can reuse it.”
A third architectural principle is to “evolve and learn”, which is essentially Flight Centre’s take on the ‘thin slice’ architectural model found in Agile, dividing up a design into slices that can be incrementally evolved.
“What that means is if one service is about to get extremely complicated then we'll change the software architecture, the design to decouple that even further,” Frost said.
A fourth principle is “globally prepared”; Frost said it is essentially about building out Soar in a way that it can integrate with other best-practice systems that the business sees value interconnecting with.
With the groundwork laid, Flight Centre brought together all teams involved with Soar and jointly agreed to a program of work to “move faster”.
That included writing some initial objectives and key results (OKRs) - an objective tracking methodology favoured by large tech companies.
“We did a pretty bad job of writing an OKR but we had to start somewhere,” Frost recalled.
“It was basically, we wanted to increase deployments by 50 percent by x date and these are the things that we were going to measure it on.
“We're going to increase this product Prodigy, which is one of our frontend applications, by 50 percent, [and] our monolith [booking system] by 50 percent … and we're going to reduce hotfixes by 20 percent.”
Frost’s team also produced a value stream map to show the path that code took from idea through to production, and the time each step took.
“One of the big areas ... where most of the time we spend is around the idea and business requirements, but I couldn't attack that - apparently they were doing really well in that area - so we just attacked at what we could, which is around the cycle time,” Frost said.
That put two sets of fortnight-long sprints covering testing and scheduling of deployments, as well as a six-hour window for release, directly in line for improvement.
Flight Centre cut time out of the process by introducing DevOps and cloud.
There was also substantial time built into existing processes - such as for testing - which was not required.
“The biggest, easiest win we've had for reducing time was when I asked why there was two weeks of testing and all the testers just went, 'it's always been like that',” Frost said.
“So I said, ‘what do you do in that two weeks of testing? And they just said, ‘we run regression tests and then we schedule the business to come and do acceptance testing’.”
Upon further examination, it was found the regression tests took a day and the user acceptance testing could be wrapped up in around two hours.
The end result was the company moved from two weeks of testing to three days.
“Then we shifted some of the testing to the left, which was actually quite hard to do, particularly to get buy-in from the developers,” Frost said.
“What we've actually done is we want the developers to actually write more of the tests. We don't want the QAs to write the tests for the developers, we actually want unit tests, functional and integration tests done by developers, and we want the QAs to do more of the exploratory testing.
“It's working quite well now. When we scale our development teams, we don't necessarily have to scale our QA teams because most of the testing is now done by the developers.”
Strangling the monolith
These changes are now starting to flow through to the Soar engine itself, which is being reshaped from a monolith to a platform whose functionality is supported increasingly by a collection of microservices.
Flight Centre is following a strangler pattern to make this happen. The pattern is often used to gradually replace pieces of functionality in a monolithic application with microservices.
The company started by “pulling a small service out” of Soar.
“In this case, it was a fare rounding service and all this service did was when a customer did a search, it would push all the search results to it, apply a fare rounding margin, and then push the results back,” Frost said.
“Usually, we would build that into the monolith. But this time, we just had a small service out to the side, it sat in the cloud and talked to on-prem, and we learned a hell of a lot about cloud containerisation and deployments, and that's how we got started.”
Once the first microservice was working, other projects “started to come flying through”, and more functionality was split out.
“As we changed the architecture, we also started to split out our teams to align to that architecture,” Frost said.
With Soar growing in complexity, monitoring all the different services that contributed pieces of functionality became “more crucial than ever,” he said. “New Relic is helping us out with that.”
Frost said that Flight Centre is now lining up new features in its booking engine that can be turned on when the business is ready to hit go.
“We went too fast for the business,” he said. “They just said ‘stop going to production because we haven't [done our] marketing’.
“I had to explain to them about feature toggles - we can always keep deploying into production, and then when we want to actually turn a whole feature on, we just turn the toggle on.
“That reduces a whole bunch of big change release stress.”
Flight Centre is also now perfecting its processes for on-demand deployments.
“We can do on-demand deployments at the moment,” Frost said.
“It's not perfect, but we're getting there.”