The limits of knowledge in a data-driven society

—Sun-ha Hong

What counts as knowledge in the age of big data and smart machines? Technologies of datafication renew the long modern ambition of turning bodies into facts. They try to take human intentions, emotions, behavior, and turn these messy realities into discrete and stable truths. In the process, however, technology is reshaping what counts as knowledge in its own image. The push for algorithmic certainty sets loose an expansive array of incomplete archives, speculative judgments and simulated futures.

All too often, data generates speculation as much as it does information. 

The breakneck expansion of data-driven surveillance is justified through the promise that algorithms and code provide inherently objective systems, and in turn, a bedrock of factual certainty for better judgment. But what happens when the data isn’t enough and the technology isn’t sufficient—as is so often the case? The limits of data-driven knowledge lie not at the bleeding edge of technoscience but among partially deployed systems, the unintended consequences of algorithms, and the human discretion and labor that greases the wheels of even the smartest machine.

We would do well to remember the words of Joseph Weizenbaum, a pioneer of early AI research, who cautioned nearly fifty years ago that not everything that can be done with technology ought to be done.”

Technologies of datafication tend to perform best within tightly prescribed parameters, where they are given very specific tasks to solve. Yet their proponents have made astoundingly lucrative businesses out of the promise that algorithms and code can be injected into any social problem with the same positive results. Indeed, Silicon Valley startups have turned it into a fundraising staple called ‘X for Y’: a TikTok for the elderly, an Uber for bikes, a Fitbit for your brain. The result is a landscape of often unproven systems that run on arbitrary classifications, messy data, and other concealed uncertainties. A facial recognition system can boast impressive accuracy scores in staged demonstrations, but fall apart when researchers ask it to distinguish women from men, or to try and detect black faces. The recidivism prediction system COMPAS boasts that it uses over 100 factors to calculate individual likelihood of reoffending. But research shows that its accuracy is roughly similar to a simpler, open-source system using just two factors. Exercise trackers combine advancements in miniaturized sensors with unproven estimates, such as ten thousand steps per day—a figure originally invented by mid-twentieth century Japanese marketers to sell pedometers—to produce their recommendations.

These yawning gaps between the promise of objectivity and its shortcomings are filled with a new array of speculative practices. Incomplete and uncertain data are often cobbled together to fabricate a sense of reliable ‘predictions’ and objective ‘insights’. The problem is that such speculations are themselves neither neutral nor coincidental.

In my new book, Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society, I investigate the many ways in which emerging forms of data-driven knowledge affect different kinds of bodies in selective, asymmetric, and obscure ways. Sex, friendship, and happiness are datafied not in terms of what is most meaningful or just, but what aspects of it can be rationalized at the lowest cost—and, increasingly, recombined and sold on for maximum profit.

What kinds of data are prioritised for collection and visualisation over others? How are their uncertainties acknowledged, or papered over in the name of technological novelty? Who interprets the data on behalf of whom? It is in these very human choices that enduring prejudices and power asymmetries re-enter supposedly neutral data.

Technologies of Speculation traces this twisted symbiosis of knowledge and uncertainty in emerging technologies of data-driven surveillance.

The first is the Snowden affair and the public controversy around the American government’s electronic “dragnet” surveillance technologies, built to quietly collect phone, email, and other electronic communications data at an unprecedented scale. Edward Snowden’s groundbreaking leaks brought not the mythological arrival of ‘sunlight as disinfectant’, but turned government surveillance itself into a staging ground for widespread paranoia and speculation.

At the same time, uncertainties around data’s ability to actually predict terrorist attacks mean that counter-terrorist operations must also negotiate gaps in their knowledge and judgment. FBI undercover agents provide money, weapons and training to a Muslim American, until sufficient data has been produced to justify classifying him as a terrorist threat. The NSA’s internal documents use racially charged placeholders like “Mohammed Badguy”, drawing on enduring prejudices to guide the technology and its implementations.   

All too often, data generates speculation as much as it does information.” 

The second site is the rise of miniature, automated tracking devices for the monitoring of everyday life, from exercise levels to emotional status, and the subsequent analysis of that data for personalized insights. The central promise here is what I call “data’s intimacy”: that smart machines will know you better than you know yourself, and that the data will provide a truer, more objective sense of who you are and what is good for you.

This promise of individual empowerment, however, is increasingly being co-opted for a market-driven ‘control creep’ into new forms of surveillance. A fitness-tracking wristband, originally developed for individuals to assemble new clues about their bodily activity toward personal improvement, now supplies insurance companies with new ways to calculate our risk values. The novel sources of intimate personal data first crafted for personal experimentation are being appropriated by employers, insurers and courts of law, such that more and more of ‘my’ personal truth is colonised in ways I cannot easily understand or contest. What counts as a productive or healthy subject is thus being reframed in terms of behaviour that are the most predictable, or rather, machine-readable.

Surveillance by the state, surveillance by oneself: both reflect the expanding reach of big data’s rationality across established boundaries of the public and the private, the political and the personal. In my book, I trace how the expansion of data-driven surveillance is transmuting what we call “knowledge” from a human virtue, rooted in experience and context, into a mass of disconnected, raw material that can be manipulated at scale for predictive control.

In all this, ‘technology’ or ‘data’ is cited as a way to cut the Gordian knot around thorny social problems. The idea that one merely pursues objective truth, or just builds machines that work better, serves as a refuge from the messiness of those structural inequities. Yet the question of what counts as knowledge leads directly to exactly such questions:

  • What counts as intent, as prosecutable behavior, of evidence to surveil and to incarcerate?
  • What kind of testimony is made to count over my own words and memories and experiences, to the point where my own smart machine might contest my alibi in a court of law?
  • What constellation of smart machines, Silicon Valley developers, third-party advertisers, and other actors should determine the metrics that exhort the subject to be fitter, happier, more productive? 

We’ve witnessed a rising chorus of scholarship that challenges the normalisation of technological rationality as a default—including the widespread conceit that everything that can be datafied must be datafied, and that everything we want to datafy can surely be datafied. We would do well to remember the words of Joseph Weizenbaum, a pioneer of early AI research, who cautioned nearly fifty years ago that not everything that can be done with technology ought to be done. Technological innovations, no matter how impressive and groundbreaking, cannot ‘solve’ any social problem without an even more fundamental change in the people and institutions around them.

The many injustices and asymmetries of datafication arise not necessarily because we need more information but because we have too much information that we utilize in indisciplined and inappropriate ways. For some, such a surfeit of data can seem an empowering thing, an opportunity to stride boldly towards a posthuman future. For others, to appear correctly in databases can be the unhappy obligation on which their lives depend.

The meaningful critique of such technologies requires building new answers to the most basic assumptions:

  • What kind of knowledge—and knowledge by whom on whose behalf—do our technological systems privilege?
  • What other conditions, beyond often narrowly defined metrics of accuracy and efficiency, are necessary to ensure that knowledge empowers the exercise of human reason?
  • How can those conditions be protected as the process of knowing is increasingly overtaken by opaque systems of datafication?

Asking these questions requires disrespecting the stories that data tells about itself, to refuse its rationalization of what looks like objectivity or progress, and to hold technology accountable to standards that are external to its own conditions of optimization. The question of what counts as knowledge is thus a question of what counts as being human, of how life becomes fitted into calculable categories of the normal, the monstrous, the dangerous, the optimal.

Read a free excerpts below of Sun-ha Hong’s new book Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society:  an inquiry into what we can know in an age of surveillance and algorithms.

Use coupon HONG30 at to get 30% off the list price when you purchase at!

Sun-ha Hong is Assistant Professor of Communication at Simon Fraser University and received his Ph.D. from the University of Pennsylvania. Hong analyzes the fantasies, values, and sentimentalities surrounding big data and AI. More information can be found at his website,

Website | + posts