NBN Co is using IBM cognitive technology including Watson as part of a project to automate workflows in its network operations centre.
Technical operations manager David Nestic told IBM’s Think 2018 conference in Sydney that the company is 18 months in to what it is now calling its “cognitive network operations” project.
The project forms part of a broader effort to equip the company for its changing role, from network builder to network operator.
“The NBN was going to grow twofold, fourfold, eightfold in as many years, and we needed to be able to scale our operations to be able to handle that,” Nestic said.
“We weren’t allowed to grow our headcount. We had to do more with what we have today.
“We needed to automate, which is pretty easy to say but how to automate is what we discovered over the last 18 months.”
The company looked internationally for inspiration and found it at Spanish carrier Telefonica.
However, Telefonica had taken a full decade to achieve its own vision, and that was time NBN Co did not have.
“We had a target to [automate] by 2020,” Nestic said.
Nestic’s team started the project by trying to understand the different use cases for automation within the network operations domain.
“We worked with IT partners, our CTO [office] colleagues and our engineering colleagues, and we started to put together what we thought the architecture might look like,” Nestic said.
Just putting together the use case definitions proved to be a slow process.
“Once we started going down to that use case level, even just to develop one use case took us three weeks and [involved] a whole army of business analysts, process analysts [and] technical SMEs,” Nestic said.
“We thought we were never going to get this thing done.”
Nestic said the team remained committed to defining all possible use cases, but in the interests of time it “started to look at ... key use cases that covered 80 percent of the work” that the network operations team performed.
The team settled on network assurance as the first use case to be tested for greater automation.
It broke assurance up into “three main core functions” - root cause analysis and correlation, resolution and remediation, and ticket and process lifecycle management.
The company decided to tackle root cause analysis and correlation first.
“One of the heaviest, resource-intensive activities with not a lot of gain is having operators sit in front of an alarm system and react to an alarm, so we thought we’re going to spend a lot of time in that space to work out how we can minimise and reduce that,” Nestic said.
“One thing we noticed with other telcos around the world was that they were able to automate the alarm management function, but they were just shifting the problem: every alarm was [triggering] an incident and then the network person would then just react to the incident.
“We didn’t want to go down that path. We thought about how we could identify that root cause, correlate a lot of events and then use that correlation to initiate the next step of automated actions.”
NBN Co called on “a number” of vendors and internal staff to participate in what Nestic called a “pretty intensive proof-of-concept”.
It was decided early on that NBN Co wanted to use artificial intelligence or machine learning as part of its network automation strategy.
“This was a very foreign concept to the business,” Nestic conceded.
“Everyone was pretty sceptical that we could use this in any way, shape or form.
“The traditional way of thinking was you build rules, you automate your workflows, and the job’s done.”
Given the AI/ML focus, the company was able to use its relatively nascent insightLab as the forum in which to run some experiments.
iTnews revealed the existence of insightLab last month, but NBN Co had declined to reveal much about its use nor technical underpinnings to date.
Nestic said that building the insightLab was “one of the best things NBN Co ever did”.
“The concept of an insightLab was an environment in AWS that pretty much allowed us to replicate what our network does today,” he said.
“[It is also] able to handle the massive loads of information and data that we were able to provide to each of the vendors to be able to prove this concept [of root cause analysis and correlation] for us.”
NBN Co saw little success in the first three weeks of the proof-of-concept.
“By that third week, the sceptics that had previously said that machine learning can’t do correlation were probably thinking [they were right and] we were never going to get there,” Nestic said.
But the team kept going, and it was during one of these trials - of IBM’s Netcool Operational Insights platform - that things came together.
Netcool is a network management suite that has recently been updated to offer machine learning capabilities.
“We were in the IBM building down in Melbourne,” Nestic recalls, watching the platform spit out correlations based on a sample of three months of alarm data.
“One of our technical SMEs looked at the correlation and said ‘that’s right’,” Nestic said.
“Right there - visually - we saw proof that you can use machine learning to be able to identify root cause.
“Everyone sat there in silence for three minutes.”
The breakthrough came when IBM recommended feeding Netcool with a far smaller set of alarm data.
“We gave them 12 months of alarm data and they cut and sliced that data and we got learnings from it that we didn’t know existed,” Nestic said.
“But then they came back and recommended we only look at the last three month of data.
“So we started to narrow down the dataset and filter out some of the noise, and all of a sudden we were getting insights and correlations and patterns that were up to 98 percent accurate.”
Convinced that machine learning could handle root cause analysis and correlation, NBN Co was confident enough to start automating more parts of the assurance process.
This work includes the creation of a “cognitive troubleshooting” function powered by IBM’s Watson technology, as well as automated processes using tools like robotic process automation (RPA) to act on alarms that identify an actual, actionable problem in the network.
Nestic sees opportunities to apply cognitive technologies to a range of other network operations functions.
“It’s going to unlock a whole lot of capability for us that will allow us to better efficiently manage the network,” he said.
“Where we would like to go is being able to use that machine learning capability to start to look at end customer incidents.
“One of the challenges NBN Co has is we are one step removed from the end user. We don’t have visibility of what’s going on, and the latency between an end user engaging with their RSP and then that RSP engaging with us [around a fault] can be 20 minutes or two hours, so it’s very difficult to correlate those effects and then map it back to a network issue [with that amount of time elapsed].
“One day, we may be able to bring in social media feeds to allow us to pattern match [problems], where we can see there’s a problem in Coffs Harbour because it’s all over social media, and then be able to look at what’s happening in our own network and whether the two sets of events correlate.
“That’s the kind of thinking we want to get to.”
Ry Crozier attended IBM Think 2018 in Sydney as a guest of IBM.