NAB has finally revealed the nature of a fraud-related machine learning use case, with the algorithm being used to detect people that file fake documentation in support of loan applications.
Chief data officer Glenda Crisp first disclosed the fraud-related use case at AWS Summit in Sydney back in May 2019, but largely declined to discuss it.
“We've been able to do some interesting things around fraud and I don't like to talk about this too much,” she said at the time.
Speaking at AWS re:Invent 2019 in Las Vegas last month, Crisp provided the first indication of where the bank is targeting instances of fraud with its machine learning efforts.
“In terms of machine learning models, we've been working on building this capability for a little while now,” she said.
“We're starting in the areas you would expect a bank to start in: anti-money laundering, cyber and fraud.
“Let me just talk about fraud quickly. This may be a shock to some of you in the room, but there are people who actually lie on their loan applications. And not only do they lie, they send us fake documentation like fake payslips.
“So we are building a machine learning model - it's actually fairly advanced through its training right now - to try to identify fake and falsified documentation, and we're seeing some really good improvements over the traditional models that we've used.”
Crisp also discussed the existence of a second machine learning model that is performing “topic and theme analytics around customer complaints.”
“We get customer complaints, and we actually would like to know what's driving those complaints,” Crisp said.
“So we have a machine learning model that's been running for a few months now on those [complaints].
“What I, as a CDO, am most interested in and I track most closely is the percentage of customer complaints related to data quality and so that's a number that I want to drive down because I don't think that should happen.
“I think our customers should be able to trust us for good quality data.”
NAB is using AWS SageMaker to allow around 400 data scientists and analytics team members to build, train and deploy machine learning models.
This forms part of a toolkit that NAB calls its Discovery Cloud or NDC.
“This is where we are building machine learning models and predictive analytics,” Crisp said.
Crisp said that data scientists were able to spin up temporary “data labs” to experiment building models for up to 90 days - sometimes longer if required.
She said there were currently 55 data labs running within NAB, “each of them with usually one, maybe two use cases going at any point in time.”
The creation and destruction of data labs happened automatically using AWS services, and they contained appropriate guardrails to ensure safe experimentation.
The automatic time limitation was intended to ensure that models were not run permanently in the labs, but were instead migrated into regular production environments if they were deemed ready.
“[Labs are] not meant to be running models all the time,” Crisp said.
“This is meant as a development or an experimentation area, so if you do build a model, and you do want it to go to production, we have a path to production and we have a production-run environment that it'll then live in, and we have all the right operational controls around that.”
Crisp noted that NDC is the portion of NAB’s data environment that is most likely to change rapidly.
“This is probably the part of the architecture that shifts and is trying out new things the most,” she said.
“We're constantly looking at new ways to do things.
“We've got a PoC going with H2O.ai, and Databricks is something else that we're exploring as well. So you'll see us trying things in here.
“This part of the architecture especially will change pretty rapidly over the next six to 12 months.”