Telstra-owned Belong is hoping to improve its end-to-end visibility of customer connections for troubleshooting purposes, expanding a two-year-old monitoring program of work.
System operations lead Hamdam Bishop told iTnews that the telco is now exploring how it might be able to expand its use of Splunk, and draw upon other data sources, to “build a full picture” of what customers are experiencing on the networks it sells services over.
Belong sells services on both Telstra and NBN.
“We’re really trying to build an end-to-end picture of Belong technology and network so we can understand the full customer experience of our systems, and a big part of that is obviously the network,” Bishop said.
“Basic things like is it up or down for a specific customer in their area.
“We obviously have existing ways of knowing what’s happening from a network perspective but we are looking at the possibility of whether Splunk can help to build that end-to-end picture as well.”
Bishop noted the complexity of delivering telecommunications services to customers.
“Having visibility of all the different connecting parts helps you to understand where a problem might be and drill down into that really quickly,” she said.
Exactly how this will look architecturally is still being worked through.
“That’s what we’re exploring,” Bishop said.
“What are all the different places that we could bring data in from to help us build that full picture - knowing that there are lots of different parts that we’d need to pull data from - and then [which] parts [can] we instrument ourselves?”
Belong introduced Splunk back in mid-2018 as a way to improve visibility into its production systems.
The tool also underpinned the telco’s adoption of site reliability engineering (SRE) principles and practices within operations by internal development teams.
“Rewinding back two years, often our customers were our alarms,” Bishop said.
“We would find out that something is going wrong - the website or network or wherever the issue may have been - and then teams would pull information from lots of different places to try to work out what the root cause of the issue was.
“Once we implemented Splunk and other tools to improve our monitoring ecosystem, we’ve drastically reduced the amount of time it takes to troubleshoot and discover what the root cause of a problem is because we have the information at hand.
“We’ve continued to develop dashboards, metrics and dynamic information we get about what is actually happening in our production systems.
“It’s also meant that our customers don’t have to be our alarms anymore. We know when something is going wrong and we can proactively get on top of that and resolve the issue before it has broader impacts to customers.”
Bishop said that Belong’s operations team had elements of ‘AIOps” - AI for IT operations - in its sights.
“Implementing strong monitoring and analytics across our environment has put us in a much better position to service our customers and make sure that issues are resolved very quickly if they’re able to manifest at all,” she said.
“The idea is that we’ve moved from being very reactive to preventative - and predictive is the future; being able to predict before something goes wrong and prevent an issue from even happening in the first place so that there is no impact for our customers.”
Belong has about 150 internal users that consume data from Splunk.
It is used to monitor production applications and systems from Belong’s website and app to its APIs and other telco systems such as CRM and billing.
Aside from using the monitoring software to detect issues and troubleshoot, it had also led to improvements in underlying infrastructure and applications.
“With the monitoring in place, we’re able to notice things like errors that were happening on the backend in our website and reduce or eliminate those,” Bishop said.
“By understanding they exist and addressing the root cause obviously improves the performance, stability and experience on our website, for example.
“Through having that monitoring in place, we’re able to now really see and understand what’s going on within our production systems and take actions to design things differently for better stability, reliability and performance, or address issues that are there and prevent them from happening in future.”
The Splunk program of work is part of a broader technology insourcing and transformation effort at Belong.