Macquarie Bank is starting to explore multi-cloud options to underpin its retail bank environment, with experimental work currently focused around the Google cloud platform.
The company’s retail digital banking platforms presently run on a platform-as-a-service that consists of Red Hat’s OpenShift container platform hosted on AWS infrastructure.
Appearing at the Red Hat Summit overnight, head of container platforms in Macquarie Bank’s banking & financial services division, Jason O’Connell, said the bank is now exploring options to run parts of this environment across more than one public cloud.
“It’s worth noting even two years ago when we selected OpenShift, it was with the idea that we could go multi-cloud,” O’Connell said.
“We’re not locked into any provider.
“At the moment we’re just exploring Google cloud and ... what it would look like - even we don’t know yet.”
O’Connell said one use case being explored is around cloud arbitrage - a brokerage-based model for cloud service consumption.
“Can we get certain workloads on a cloud that’s cheaper? Can we use spot instances? Can we spin things up and down on Google [if] it’s cheaper?” he said.
“That also raises questions around whether we need federation.
“Even on AWS now we have three production [OpenShift] clusters. If we went multi-cloud, how are we going to manage that?”
O’Connell suggested that some of the bank’s development teams may also have “differentiating or unique” workloads that were more conducive to run in one cloud service over another - or perhaps even split between two cloud environments.
“Can we go across [clouds] with services so it’s really not just cloud AWS, cloud Google - but it’s actually criss-crossing [workloads between them?],” O’Connell said.
“That’s another thing we’re exploring.”
Macquarie Bank began building and augmenting its digital banking platform on an OpenShift PaaS back in 2016, before revealing the project at an OpenShift Commons gathering in Boston in May last year.
The project was accompanied by a shift to DevOps.
One of the reasons to go down the PaaS and DevOps path was to increase the speed at which new code could be shipped and incorporated into the production environment.
O’Connell said last year that the goal was to ship new code “in minutes”.
“When we set up that goal as part of moving to OpenShift, we asked ourselves this question, which is: how long does it take for us now to release a single line code change to production?
“If it was an emergency change, it would probably take four hours. The actual change - we’ve got automation - is quick, but all of the approvals you need to go through is always a big drama.
“Even if you look at a minor fix for something - something that should be easy and takes a developer a day to write - why does it take a month to release to production?”
The approvals and long lead times were seen as necessary because, pre-OpenShift, the retail bank ran in “a single production environment”, O’Connell said.
“That’s very normal but we thought we’d rethink this because when you’ve got a single production environment you’ve got a single place which is very fragile,” he said.
“People were very worried about breaking that environment.”
O’Connell said overnight that the impact of problems introduced by a code change could have a big impact.
“If you think why people were scared of delivering something into production, why a small change could be a scary change, a big part of it is the blast radius if something went wrong,” he said.
“Connecting through [the platform] we’ve got our own channels, we’ve got mobile apps, we’ve got a website, we’ve got a lot of partners, and there are other companies connecting through as well, so even if you did a small change, if it caused an issue, everyone was affected at once.”
The end result was that new features and bug fixes were bundled into fortnightly releases.
“But that’s slow, and if you missed that release or needed to change anything, you needed to wait another two weeks,” O’Connell said.
“Other teams [at Macquarie] were doing monthly releases. This is a very slow approach to change. What we wanted was to do things quicker.”
He continued: “If we look at what we do with OpenShift now, our objective was to reduce that friction at getting into production and to accelerate our innovation because we could move quickly, test and learn and experiment.”
Under the PaaS model, a single production OpenShift cluster contains a series of environments where developers can safely troubleshoot or test releases before taking them to full production.
“For example, now if a customer has problem we want to investigate, we can route that customer’s traffic and only that customer’s traffic to a prodfix environment,” O’Connell said.
“In here, our ops fix team has the opportunity to turn on debug logging, and to turn on any diagnostics that they want, and even change configuration in a safe place to investigate that so they can immediately respond to the issue.”
If that investigation uncovered a wider issue - perhaps requiring a minor code change to fix - the developer was able to first push it into an alpha environment and test how it worked with just their own bank account.
“We’ve got a continuous deployment environment so now rather than having a lot of approvals and processes to get into production, we just do it automatically - but we don’t impact the production traffic,” O’Connell said last year.
“It’s still in production - but our developers and testers all have accounts at Macquarie so we actually use our own accounts and we can do testing in that continuous deployment environment so we can get very fast feedback [about a change].”
Once it passes alpha testing, the change can be shifted into either a staff or public beta environment.
“Any staff in Macquarie can get access to the beta, and then they’ll post feedback on our Facebook for Workplace about the application,” O’Connell said.
“They’re very good at doing the testing for us. We learn from their feedback if we want to release features into full production.
“There’s also a public beta, where customers are encouraged and rewarded for reporting bugs and for testing things out.”
Once the team is confident that the change is ready for full production, they build a new production environment and route a percentage of production traffic away from the existing full production environment to it.
“We monitor that, see if we’re getting more errors or our latency is getting slower, before we flick over all the traffic,” O’Connell said.
“We can roll back immediately as well. This means we can do releases now rather than doing them at night or [them] being a big deal.
“I could target a release and a change just to you. We could target it to a percentage of customers, monitor and rollback quickly if there’s a problem or dial it up if it’s looking good. We can target any channel.”
Waiting for Istio
O’Connell said overnight that the bank’s major challenges with OpenShift were largely due to it being “somewhat of an early adopter” of the technology.
“I think the newness of it is probably our biggest challenge,” he said.
“Going back two years ago, there were some very basic components that weren’t there at the time but that we knew were coming, and even now there are pieces of work that we just don’t tackle.
“We do a very quick fix because we know it’s coming later. It’s just moving and evolving so quickly.”
The bank has been keenly awaiting Istio, which promises easier creation and management of microservices.
“We’re waiting a lot for Istio which is coming in the future so we’re holding back on investing in certain areas because of that,” O’Connell added.
Outside of the retail bank, the broader Macquarie Group is on a cloud migration of its own, shifting into AWS under a project codenamed Arturo.