BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Data Science Has Become About Lending False Credibility To Decisions We've Already Made

Following
This article is more than 5 years old.

Getty Images

One of the greatest failures of data science has been the way in which it has devolved from the genuine search for answers into just another tool to lend credibility to the answers we want. It no longer matters what our data actually says or whether the data we are using is in any way relevant to the questions we ask of it. All that matters is that we can justify our preordained decisions with the certainly of “data.” As we rapidly undermine the promise of data science, will our trust in data fade with it?

The misuse of data and statistics to support preordained decisions has reached such a cultural touchpoint today that even Scott Adams’ Dilbert cartoon has lampooned the practice, with the boss offering the helpful advice that “Does it matter [if my spreadsheet is wrong], as long as it gives me the answer I want?”

It is truly remarkable that our era of searching data for answers has devolved into searching data until we find support for the answer we've already decided upon.

Today’s data science is less and less about the genuine search for answers. We no longer embark upon an analysis with hypothesis in hand and open to whatever answer our data ultimately yields. Instead, like doctor shopping, we “data shop” until we find a dataset and methodology that gives us the answer we want.

We live in a world in which our preeminent scientific institutions convene the nation’s most respected researchers to advise our government on misinformation and the resulting report centers on Twitter not because those researchers believe it plays the most important role in the spread of misinformation or yields the most accurate results, but because it was the easiest for them to get their hands on. It seems even academia has been led astray by the siren song of data hype.

Indeed, many areas of data science like “social media analytics” are not actually based on methodologically or statistically rigorous data analysis at all.

Social media analysts focus nearly exclusively on Twitter because it is the easiest dataset for them to get their hands on, not because it is the most relevant or accurate dataset for the phenomena they hope to measure.

Nearly the entire historical output of social media assessments going back the last decade and a half have reported absolute counts rather than normalized trends, calling into question or even completely invalidating a large fraction of the research drawn from social media.

Researchers blindly report trends from datasets they have no understanding of, running basic searches and reporting results without any idea of how their datasets are changing out from under their analyses.

Yet, none of this matters because we no longer see data as yielding answers, but rather as a veneer of credibility to wrap around the answers we want.

Data scientists no longer turn to statistics, rigorous methodologies and the scientific method to interrogate large datasets they understand deeply and yield findings that have been carefully normalized, scrutinized and verified.

Instead, data science has become two things: hyperbole and lending false credibility to decisions that have already been made.

Hype has become synonymous with how the research community increasingly views data science. Researchers sprinkle data science buzzwords over their proposals, publications and grant submissions like some sort of magical fairy dust, confident in the unfortunate truth that the mere presence of phrases like “big data,” “social media analytics” or “deep learning” will massively improve their odds of success, regardless of the actual question being asked or the accuracy of their results.

Yet, beyond the hype, the analyses that are actually performed have unfortunately become about searching for tenuous or even entirely false findings that can lend some air of credibility to past decisions that have already been made. Any decision, no matter how incorrect, can find conclusive data-driven support merely by searching until some method applied to some dataset is sufficiently adjusted to yield a supportive finding.

Putting this all together, data science is no longer about analyzing data or giving our data the opportunity to speak to us.

Instead, data science has become about hype-fueled fairy dust that can boost the prospects of a resume or report with its trendy buzzwords.

Most dangerously, it has become about the misuse of statistics, data, research methodologies and the scientific method to lend false credibility to decisions that have already been made.

We no longer devise a hypothesis and test it using data. We start with the conclusion we want and find the data and methods to support it.

As data science becomes about false hype and conscripting data in the service of preordained conclusions, we risk undermining the public’s trust in data and halting the data revolution just as it has begun.