The concept of dark data has emerged as a persistent theme at the Gartner Business Intelligence Summit, held in Sydney this week.
Dark data is the data most companies have, such as email, contracts and the like, but simply archive and don’t subject to analysis.
“Most companies already have big data,” noted Gartner analyst Donald Feinberg. “They just don’t realise they have it.”
Dark data makes up a lot of what could be characterised as big data, he added, and represents a significant resource for companies that remains largely untapped.
Tapping the deep wells of dark data can lead to a significant competitive advantage, he said. “Use dark data as a source for decision making and it will change your ability to outperform competitors,” he said.
Many companies are looking to use tools such as Hadoop and various flavours of MapReduce —both big data analysis tools and both forged out of open source code — to understand and make use of the large amounts of data they have to hand, including dark data.
Feinberg recommended companies looking to use these tools first establish a business and use case for the tools, and then implement a test environment.
“It’s not good enough to simply want to play with the technology,” he commented.
He also recommended going with an established vendor when implementing these tools – even if your company doesn’t initially want to take up a support contract.
"These environments are very tricky and expensive to implement,” he said. “It’s vital to go with a vendor so you know the support and training is there if and when you need it.”