Data Overload: Why Marketers Need to Focus on Data Quality, Not Quantity

The unquenchable thirst for customer data is driven by a fallacy

Meet one-on-one with ADWEEK–curated partners to help identify and solve key business challenges. Apply for ADWEEK Connect, the innovative virtual networking program.

The epiphany of the 2010s technological era was that people are streams of data. Likes, dislikes, families, friends, hobbies, jobs—everything that makes up a full life is just data awaiting capture, now amenable to all kinds of sophisticated techniques to maximize the likelihood of a desired action. In 2024, it’s rare that a corporation isn’t applying significant effort to dip its spoon steadily deeper into the river of consumer data. 

This unquenchable thirst for customer data is driven by the fundamental belief that more data leads to better data models, which can drive efficiency and more revenue. This, however, is false. Not only does more data not always lead to better models, but it can actually degrade the model’s power and explainability. The advertising industry is suffering from data overload—making us less effective and causing us to lose the trust of the customers we market to. 

The data snapshot 

Even if all external restrictions were lifted and we could gather from all the data sources we wanted, a smart marketer recognizes that we should restrain ourselves for a more fundamental reason: Much of our data is highly correlated, making it nearly useless. 

To understand this, imagine you are a photographer standing an arm’s length from a skyscraper. You can’t step back to get the whole building in one picture; instead, you take many pictures from different positions and angles around the building to stitch them together and make a composite photograph of the whole building.

In this example, each picture is a new source of data we’re adding to our model, the reconstruction of the full building. As long as each individual snapshot is of a different part of the building, it’s easy to fit them together to get a full view. However, with highly correlated data, our pictures overlap, depicting the same part of the building multiple times. It’s much, much harder to be accurate in this case. 

No matter how many pictures you take, if the information content of each new one is low, your model cannot improve.

Think smaller, build smarter 

So, if we cannot wait to deepen our dataset, and if gathering all available data can weaken our results, how then do we build accurate, explainable, and ethical models to advertise to our customers better? 

The answer is to think smaller. Avoid the temptation to build “One Big Model” and instead build several smaller purpose-built ones that work together. 

As AI becomes a larger part of the marketing tech stack, and terms like “training data” and “fine-tuning” become part of the lingua franca, one that should become similarly familiar is “feature selection.” Feature selection lives in that all-important but often overlooked space between gathering all the data and starting to train a model with it. It is the name for a collection of tools, techniques, and heuristic principles that are used to better understand the data and its value to the model before training even begins. 

Conversion attribution might be the fundamental problem of advertising. Last-click attribution’s flaws are well known—building a good multi-touch attribution model is still an art form that takes a lot of time, knowledge, and care. AI can help uncover the full impact of media on sales or other downstream metrics. It is well known that factors beyond just advertising spend need to be taken into account to properly quantify the return on investment. Overall economic health, brand awareness, local household income, and population density are just some of the data an AI can have access to in order to answer this question. In fact, the savvy marketer will want to drill down even further and look at an individual consumer’s credit card history, their interests as revealed by internet activity, their age, gender, race, etc. There’s no shortage of possible factors that might influence a particular group of consumers to convert. 

Feature engineering helps us sort through this overabundant data, picking out just what is most important for the task at hand. Well-understood techniques like principal component analysis and variable importance analysis will quantify how much our data can explain the observed sales and rank the contribution of each of the sources. Thus, instead of demanding all of this data from consumers, which can be hard to acquire and comes with overhead costs, we build a model that is just as powerful on the selected, most impactful data sources that are identified during our feature selection process.

Marketers must take better advantage of the readily available feature selection tools to use AI in an ethical and stable way. Consumers are becoming savvier about the value of their data and demand care and transparency in how it is being used. Thankfully, research into feature selection over the last decade or so has produced many sophisticated tools beyond covariance matrix inspection and principal component analysis to produce slimmer, sleeker, better-performing models with only the most relevant data. Just as the post-model interpretation tools can provide transparency to consumers, feature selection demonstrates the care taken to make responsible use of the data collected. 

Data overload may only exist in its current form for a relatively short period, and the crest of the wave of data availability may already be receding. Ad blocker usage is at its highest rate ever in 2024, data privacy laws are being adopted state by state in the U.S., and consumers trust marketers to use their data responsibly less and less—a staggering 60% of consumers believe companies are misusing their data. It is thus more important than ever to resist the pull toward data overload and cleverly employ feature selection techniques to build smart, responsible, and effective models for our clients.