A Q&A with John Cheney-Lippold, author of We Are Data: Algorithms and The Making of Our Digital Selves.
What is posthumanism and how does it shift emphasis from the contained material body to information?
Cheney-Lippold: Posthumanism is a theory that what makes us up extends beyond our human bodies to include information about us. This is seen in how one’s credit score, which is algorithmically produced, determines if or if not you can buy a house. Ostensibly, this decision has no connection to your body, but demonstrates how one’s self reaches outside one’s body and into the confines of databases and archives. As digital surveillance accumulates more and more data about ourselves, the power of this shift toward information becomes ever more important.
You suggest that our data doesn’t reveal a Truth about ourselves, but instead our “truth” is told to us by an algorithm’s output. How do the terms of information become the terms of our emergent digital subjectivity? How is this reconceptualization of ourselves used to categorize and control us?
CL: When we know who we are—and when we know how who we are regulates how we are seen—we know a great deal about how the terms of our life: what we can do, cannot do, and need to do. When who we are is determined by algorithm, these terms are unknown to us. In this way, whatever an algorithm says we are, we are. There is no truth, but rather a constantly moving “truth” that changes according to our data and how it gets algorithmically valued.
Data transforms an individual into what you call a “dividual.” What does that mean?
CL: We are first dividuals online, not individuals. This is because users of the Internet have no suitable index that connects them to their individuality (two people can easily use the same computer, email account, or Facebook account). Accordingly, identity online comes from connecting different dividual fragments of our data together in order to create an idea of the user. Three dividual fragments of us, then, might be our IP address’ location, the search term we just searched for, and the time of day we searched for it. These fragments, which mean little by themselves, could go on to create an idea of who you are that then determines you to be a “citizen” of the US or a “foreigner”. In this case, while our individualities have histories (and passports) that we use to directly intercede in defining who we are, our dividualities, when algorithmically assembled, produce an idea of our selves likely alien, and unintelligible, to us.
How does who we are, algorithmically speaking, change minute by minute, byte by byte according to the interpreter’s proprietary formula and agenda?
CL: Much like in a conversation, where every additional word ideally helps you understand more about the person you are speaking with, every new piece of data you produce adds to the potential meaning of who you are. But, of course, data can be wrong much like people can lie, cheat, or steal. It’s up to algorithmic pattern-matching to determine what is a useful piece of data (one that might change your algorithmic “gender” from “man” to “woman”), or what is a useless meaningless piece of data (one that might have no effect on your algorithmic “gender” at all). But much like one’s identity can change—based on a new website visit or product purchase—the categories of “man” and “women” themselves change, as users characterized as “man” or “woman” do new things, en masse, to collectively redefine the category of “gender”.
Are we unconsciously facilitating our constant contact with an invisible regulatory regime that effectively recalibrates the nuances of who we are in order to define and control us?
CL: I think the only way Internet users are willing to accept such a wide-reaching regulatory regime is because we do not see, and thus unconsciously facilitate contact with it. The purpose of writing my book was to demonstrate how profiling, surveillance, and machine learning are formatively changing not just what the Internet looks like, but literally who we are. Many people learn about digital surveillance and attribute it to “just” targeted advertising. I aim to push readers to think about how digital surveillance actively reconfigures who you are, and who you can be, according to the latent trends found in your latest data.
Algorithms depend on the dividual bits of data we produce rather than knowledge of an integrated individual. Should we be less concerned then about our need for online privacy? How do privacy needs for our body differ from privacy needs for our information?
CL: One privacy is not necessarily more important, or deserving of more concern than another. While bodily privacy is incredibly important, I want to focus, instead, on how we can begin to understand what dividual privacy is and can be.
To focus on informational or dividual privacy is to foreground how privacy is more than just preventing somebody from looking at you in the bathroom. Privacy in the U.S. historical context has been what we call the right “to be let alone.” As such, while we might be alone in our room on our computers, the information about who we are is rarely, if ever, “let alone”. Data about ourselves might make us a “citizen” who has the right to privacy or a “foreigner” who doesn’t; data about our screen size or operating system might make us pay extra on a plane ticket because we are seen to be “wealthy”. Importantly, these pools of data necessarily change according to new web site visits, search queries, or GPS locations. We are forced to live in a digital world that we can never really know—and we thus lose the ability to be “let alone”.
Policing and security agencies can create “predictive policing” algorithmic categories to label certain people as “at risk” for committing crime. How are these algorithms dependent on racial and economic stereotyping and how does it run afoul of profiling laws?
CL: Police agencies that use predictive policing, like the Chicago Police Department, claim individuals are “at risk” because their data is connected to other “at risk” individuals. The central problem that spoils this idea is that the CPD’s arrest data does not cover the entire Chicago population equally.
Due to a host of reasons, poorer people are more likely than wealthier people to be detained and arrested by police. And, due to a host of reasons, people of color are more likely than white people to be detained and arrested by police. And lastly, due to a host of reasons, the neighborhoods where poorer people of color live are patrolled more vigilantly than wealthier white areas. If the data that the CPD uses to create “at risk” comes from a biased data set, then the algorithmic outputs from that data will necessarily be burdened by economic and racial prejudice.
How does algorithmic identity reinforce stereotypes or, worse, profiling and racism?
CL: A funny, but extraordinarily problematic, example of stereotypes/racism is found in the book’s introduction through the video “HP Computers are Racist”. In this video, two individuals are in front of an HP computer with facial recognition technology. The white individual, Wanda, is identified by the camera; the black individual, Desi, is not. As it turns out, the data of a black man’s face didn’t fit the algorithmic identity of a “face”, and thus the algorithmic pattern defining a “human”. This explicit example of racism was, by many, technologically white-washed away: defenders of HP claimed the error in recognizing Desi wasn’t about race, but about resolvable lighting problems.
But this answer is to ignore some very real, and important, structural facts about the world, particularly: algorithms and data are not neutral. Both come from lived histories and conflicts, including racist ones, that unintentionally (or intentionally) seep their way into the digital realm. The fact that Desi wasn’t recognized could be due to the low numbers of black engineers in Silicon Valley companies, and/or it could be due to HP not thinking about racial differences while developing their facial recognition product, and/or it could be due to HP using betatesters who were lighter skinned. Nonetheless, the power relationships that order and define us in the offline world are always, and will always be, connected to the power relationship that define us online.
What is algorithmic citizenship? How does data determine our temporal, informationalized citizenship and how does it affect our privacy rights against government surveillance?
CL: Algorithmic citizenship is a mode of identification that governments use to determine users’ citizenship status when no documentation is available. This audacious algorithmic identification attempts to recreate the idea of the “ideal citizen”, and thus the “ideal foreigner”, through an algorithmic analysis of available data: where one’s IP address is, to who one talks to and even to what language one speaks. In terms of privacy, NSA documents have shown that a user can be legally surveilled if their traffic is deemed to be “51% confidence foreign”. Because one’s “citizenship” is based on data—and only data (and not some permanent identity card of birth certificate)—it changes with every person you talk to, every time you travel abroad/cross borders, and even every time you change which language you speak.
How can we protect our privacy in a world where surveillance is everywhere yet rarely felt? How can we break free of being algorithmically dominated and manipulated?
CL: The strategy I propose in the book is twofold. First, surveillance affects different groups of people differently—privacy strategies against surveillance need to take into account how race, gender, class make surveillance tougher on some than on others. Second, because of digital surveillance’s ubiquity and the technological impossibility of being truly unmonitored online, I suggest we aim our privacy practices toward confusion, not avoidance: producing aberrant or random data in order to confuse those algorithms that pattern assess our data. This strategy, which some scholars have called obfuscation, works by generating random, meaningless data automatically in an effort to disorient the possibility for any cohesion of identity that could be then used to determine who you are—and how to manipulate you.
By tracking our data, we are remembered forever. In Europe, citizens have been granted the right to be forgotten and can petition companies like Google to remove their information from databases. Do right to be forgotten laws apply here in the US? What are some of the differences between European and US ideas of privacy?
CL: The “Right to be Forgotten” stakes out an interesting future for privacy because it harkens back to privacy’s origins, while trying to think through those origins in the digital sphere. These laws do not apply to the US, whose lawmakers are now aggressively rolling back individual privacy protections.
In terms of comparison: despite the legacies of US privacy law and the 4th Amendment, domestic norms of privacy are very much aligned with an American rhetoric of individualism combined with a post 9/11 racializing sensibility: if one deserves privacy, it’s because they’ve earned it by being a good citizen; if one doesn’t deserve privacy, it’s because they don’t need it, anyway—they shouldn’t have anything to hide. European ideas are much more collective and public-oriented, which comes from a certain expectation that all individuals in society should have privacy. In the case of the European Union’s Constitution, privacy is a formal right, while the US’s “right” to privacy is theorized and inferred, but never stated.
Hundreds of companies and agencies identify a person algorithmically in different ways, meaning that there may be thousands of versions of one person, each defined by a different gender, race, class, etc. depending on the algorithm used. What is the danger of being redefined from an individual subject to a category-based profile that is remapped every day as mathematically equilibrated aggregate of knowledge? Is there such a thing as authenticity online?
CL: First, there is no such thing as authenticity online, as the metrics we use to figure out who we are (what you say you are, what you want to do) are muddled by the fact that what one says one is doesn’t matter, and what one wants to do online is directed by how one is seen.
Second, the danger of this kind of redefinition of subjectivity is that one never truly knows who they are. If “gender” for Google is different than “gender” for Microsoft, and both algorithmically change every day in order to follow the new trends of what their data says “gender” is, then the ability to know oneself—to understand who one is and how one is being treated—is impossible. This impossibility is what the book is about; we shouldn’t try to avoid it, as it is a Sisyphean task, but rather we should learn how to respond to it, and productively resist it.
What information are companies and advertisers determining about us when we “Like” something on Facebook?
CL: To “like” something on Facebook is to generate data that connects a user to a certain concept, product, person, or place. Facebook uses “likes” to then place users in different boxes (a user is “Liberal” because they “like” the environment; a user is also “Hispanic” because they “like” tango music) that then allows advertisers on Facebook to target their ads to Liberal Hispanics. But more than merely producing datafied value, “likes” are used to create these boxes themselves. No one told Facebook that Hispanics “like” tango. Rather, known-Hispanic users “liked” tango at a disproportionate rate, suggesting that to “like” tango means to be “Hispanic”.
How we live and experience something as multifaceted and multilayered as the connections between gender, race and class is reduced to statistical quantitative terms in algorithms. How does this quantification deprive us of the opportunity to critique and challenge these categories?
CL: Both marketers and political theorists have realized that addressing a person only through the lens of gender, OR the lens of race, OR the lens of class, doesn’t get, at all, to the lived experience we have when that person is addressed through the intersection of gender AND race AND class. This intersectional theory, as it is called, allows us to critically understand how, for example, women are not all treated the same by institutions of power like the police, health system, or legislature. For example, domestic abuse laws for all women in general fail to understand that women who are wealthier likely will have a different experience with the legal system than women who are poorer.
When these categories are made through data and statistics (where a user is “Hispanic” just because they “like” tango music)—and instead of lived experience—this intersectional approach loses its political teeth. When we don’t know what, exactly, makes up a “woman” in terms of data, or what, exactly, makes up “wealthy” in terms of data, the critical potential to see how race matters, class matters, or gender matters, becomes impossible. The identity of those categories are owned by companies like Google—not us. In summary, they lose their politics.
We Are Data is available now!
John Cheney-Lippold is Assistant Professor of American Culture and Digital Studies at the University of Michigan.