Too much data can be problematic: Gartner

By Athina Mallis

Nov 7 2022 1:40PM

Sometimes, less is more.

When using data to understand a customer, make a decision or simply answer a question, the more data someone uses the more problematic the outcome may be.

Too much data can be problematic: Gartner

During the Gartner Data and Analytics Summit in Sydney, Sally Parker, senior director analyst at Gartner and Peter Krensky, director analyst at Gartner broke down how more data, can actually hinder not help organisations.

Parker said, “But giving me all the data is not a strategy. We shouldn't just be data plumbers, laying data pipelines from source to storage we should also act as our organisation's data concierge, guiding them to the source of data that provides the right insights.”

For a very long time, business leaders have only been seeing data as an asset but they should also see it as a liability, Parker added.

“How can we ensure that the data that we have is an asset and not a liability? How can we ensure the data that we have is being used to drive business results? The truth is, we don't necessarily want lots of data,” she said.

There are three ways to obtain data that will make organisations smarter knowing what data a business has, not underestimating small data and thinking about the data a business creates.

Understanding the data a business owns

An organisation might not have a shortage of data, but what it will have is a scarcity of accurate metadata, according to Krensky.

He said, “Without metadata, we do not have meaningful data. Without metadata, we don't know what we have, we don't know what it means or where it came from. Traditionally, it requires a lot of effort to maintain accurate and usable metadata. But by applying machine learning techniques to our metadata, we can transform it into active metadata.”

Active metadata continually detects and adjusts to the patterns in our data, Krensky explained.

“This is going to enable self-organising and self-optimising design concepts such as the data fabric. This is not only a more efficient data management environment, but it also drives usage,” he added.

Using small data

Krensky sees small data as an ingredient substitute while cooking, it may not be the correct component but it could enhance the meal in different ways.

He said, “In our case, as we will look at data we are collecting and storing. Is there other data that would be better, more accurate, safer, cheaper, more accessible? Can we use that data instead of the data we collect?”

Never underestimate the power of small data to be more insightful than big data, Krensky said,

“Just like a minimum viable product is enough to get the job done, we should aspire to minimum viable datasets,” he added.

Smarter data

Increasingly, some of the most valuable data will be the data that organisations create and not the data that they collect.

According to Parker, synthetic data is artificially created data that has similar attributes to the data that it mimics. She said it's become really important for two reasons.

“First, we can use synthetic data to reduce that data liability we estimate that by 2025, synthetic data will enable organisations to avoid 70 percent of privacy violations. If you have a sensitive customer or patient data that you would like to use but you can't, you could replace it with synthetic data without losing any of the insights it can deliver,” Parker explained.

Secondly, synthetic data enables businesses to move faster and fill the gaps in their actual data.

She said, “Gartner estimates that by 2030 the majority of the data that we use to build data models will be synthetic data. This is crucial in a world of artificial intelligence, and model building, where our data often lacks rare target variables.

“Start investigating synthetic data because today's use cases of reduced risk will enable tomorrow's use cases of better predictive models,” she ended.