Information Wants to Be Free, Or Does It?: The Ethics of Datafication

Soaking in info, the soothing facts In just a minute we’ll get to the bottom of just about anything Anything at all

(Pat Maloney, “Deaf Ear to the Ground” 2014)

Introduction

We live in an age of surveillance.¹ We now blog about our lives openly. We carry around smartphones that push geospatial information about our location into the cloud. All this voluntary datafication has changed the way surveillance works so that now data can be easily captured rather than laboriously gathered. Ever since 2013 when news organizations like the Guardian began reporting on the extraordinary collection of classified documents leaked by Edward Snowden it has become clear that state surveillance on an unprecedented level was being undertaken by the Anglophone Five-Eyes countries.² To give but one example, the Dishfire system reportedly gathered close to 200 million text messages a day, automatically extracting all sorts of metadata about us (Ball 2014).

Fig. 1. Slide about Dishfire from 2011 deck (Ball 2014)

But it is not only the state that is taking advantage of the proliferation of digital devices and the stream of data exhaust they conveniently leave behind. We log and share massive amounts of information about ourselves whether through social media or fitness devices. Surveillance has become spectacle and we are being sold the selfie-sticks and personal drones that help us make a more professional spectacle of ourselves. Social media companies then sell advertising space around our spectacle and more metadata about us. Other companies then develop surveillance tools to help intelligence organizations, the police, and companies analyze this wealth of data by and about us. Arrayed against commercial and state surveillance are hacker/activist organizations like Anonymous and Wikileaks that pioneer disruptive tactics that often also “liberate” and make available even more of our data. Some of this ends up being stored and sold on the “dark web.”

It is this massive capture, organization, and study of human activity that this paper is about and we think it has its analog in the human sciences – in the scholarly desire to build digital archives of everything we know about ourselves so that we can get to the bottom of everything. For this reason we think those of us in the digital humanities and computing who lead projects digitizing data about people need to return to the question the ethics of datafication. In this paper we are going to do this in four stages:

First, we will talk about what we think has changed such that we need to return now to the question of the ethics of digitization.
Second, we will confront ethical arguments for digitization and access, focusing on the view that “information wants to be free.”
Third, we will introduce counterarguments around community knowing.
Finally, we will discuss the difficulty of thinking through these issues ethically and conclude with reflections on the need for careful relationships between research and researched.

To begin, a word about the word “datafication.” We choose this rather awkward term over “digitization”, which we sometimes use synonymously, as it makes clear that the focus of this paper is on the processes, infrastructure, organizations and decisions that go into capturing aspects of the blooming, buzzing confusion into data and then adding metadata so that it can be stored, accessed, analyzed and used in unforeseen ways. Digitization would be the rather narrow set of activities of scanning books or capturing street sounds. Datafication starts with the decision about what to capture, what equipment to use, how to enrich it, and how to make it accessible for human use. Perhaps, following Johanna Drucker (2011), we could talk about the “capta” and “captafication” – in other words, that which is captured of the world and then rendered in digital form, but “datafication” is more likely to be understood.

Why look at ethics now?

Let us begin by saying that we don’t think that the digital humanities have a particular ethical responsibility distinct from other fields. This is not a “gotcha” paper. Rather, the humanities have a long tradition of thinking about the ethical, political, and epistemological implications of technology. This tradition goes back to the story of the invention of writing in Plato’s Phaedrus (274c-275b) to Foucault’s (1977) discussion of discipline and the panopticon up to examinations of the foundations of our knowledge infrastructure like those of Bowker in Memory Practices in the Sciences (2008). In fact, one of the reasons to look at the ethics of datafication is that this is one of the things the humanities do well and in the tradition of the humanities we ask about ethics over and again, now and again. Ethical reflection is one of the gifts the humanities bring to this age of surveillance – we draw on traditions of thought adapting them to the particularities of the new surveillance.

The second point we would make is that this paper is not about the ethics of analytics or data science, as important as that is. With Marco Büchler, the present authors published a paper in the German Journal for Artificial Intelligence on “Is it Research or is it Spying? Thinking-Through Ethics in Big Data AI and Other Knowledge Sciences.” (2015) In that paper we argued that it is our students in the digital humanities, data sciences and associated fields who are getting jobs in corporate analytics and intelligence services and we therefore have a responsibility to encourage them to think about the ethics of data science. Likewise Malte Rehbein (2015) presented a fine essay at the 2015 Digital Humanities Summer Institute where he talked about the dual use problem where technologies developed authorship attribution and stylistics in the humanities could be used for surveillance.

More applicable to datafication is a chapter Todd Presner wrote, titled “The Ethics of the Algorithm: Close and Distant Listening to the Shoah Foundation Visual History Archive” (2016). Presner raises important questions about the appropriateness of the database form of digital archives like the Shoah Foundation’s VHA, especially given the role played by information technology in the Holocaust. His understanding of algorithm includes what we are calling datafication – i.e. the structuring of the video data with metadata into a database with an interface along with the algorithms for querying the collection. What we will borrow is his philosophical turn to thinking about how digital archives structure potential relationships with the other whose testimony has been digitized. For the moment let us note that there is much more to be said about the ethics of algorithms, text analysis and text mining, but this paper will focus on the data side as that is more often where digital humanists intervene.³

The time to bring digital and media literacy into the mainstream of American communities is now. People need the ability to access, analyze and engage in critical thinking about the array of messages they receive and send in order to make informed decisions about the everyday issues they face regarding health, work, politics and leisure. (Hobbs 2010)

So what are some of circumstances of the now that make it especially important to revisit the ethics of datafication? Well the first, alluded to in the introduction, is the growing evidence that big data surveillance and analysis has become an issue of democratic citizenship. As we voluntarily give up more and more data and are managed more and more by algorithms, we need to understand the consequences and opportunities. Likewise, we need to teach our students to think through the ethics of datafication to prepare them for the challenges ahead.

Related to this is the hype around big data and its opportunities. Companies like IBM are enthusiastically promoting solutions like Watson. The (Obama) President’s Council of Advisors for Science and Technology studied the issue of big data and told us there are both significant concerns and a wealth of opportunities to make real progress in areas like health with big data (Podesta et al. 2014). Needless to say we have heard this before, whether it was hype around democratization and the personal computer, democratization and the internet, social media and democracy in the Middle East, or the value of the Internet of Things. We don’t doubt that there will be significant advances due to big data, in fact we are committed to exploring the opportunities, but we will only know the true opportunities if we continuously ask critical questions about the ethics and science.

A less familiar alternative, the “capture model” has manifested itself principally in the practices of information technologists: it is built upon linguistic metaphors and takes as its prototype the deliberate reorganization of industrial work activities to allow computers to track them in real time. (Agre 1994)

The third issue that makes datafication a pressing issue is change in how data is gathered. In the past surveillance was active; now it is passive (aggressive). Before mass datafication, someone had to spy on you. They had to steam open your letters, take pictures, listen in on the line, and then write reports. Digital humanists had to go into archives, pick through materials, scan them and collaborate with archivists to make the data accessible. Now technology has made it possible to capture data being digitized and made accessible by the people surveilled themselves, whether it is through a Geocities web site archived by the Internet archive, a blog maintained on Blogger, a stream of Tweets, or fitness data off a Fitbit. We now take our own pictures and post them to Facebook with metadata for others to tag, thereby adding more metadata.

Related to the ease of capture is the problem of the quality and curation of data. As it becomes easy to scrape vast datasets, it looks like the value and expertise of the digital humanities in the careful creation, enrichment and curation of high-quality datasets is becoming irrelevant. The Google Books project and other large-scale digitization projects have supposedly enabled new methods in ways our boutique digital humanities projects have not. Who will want the “hand-carved artisanal TEI” (Jockers & Flanders 2013, 4) we have curated in the face of millions of books available through the HathiTrust? Datafication threatens to make anachronistic all the work of humanities computing around scholarly digital editions without even recognizing the field as an alternative. The very difference between industrial databases and what we do will not even be taken seriously if we don’t talk about the dangers and ethical issues around large-scale datafication. Do we trust inferences drawn from big and dirty data? Is it ethical to use such data when making decisions that affect people? Who will speak up for the importance of the data given over to algorithms if not us?

It is this ease of capture of questionable data that actually got us first reflecting about ethics. A team of us at the University of Alberta was playing around with the Twitter capture tool Twarc when the Gamergate controversy erupted.⁴ We realized this was an important moment in videogame culture, so we started scraping #gamergate tweets to document the moment game culture became a culture war. Once we started depositing the Twitter data along with other materials we had scraped from 4chan and 8chan we realized we needed to think about the ethics of capturing and archiving such toxic data. We discussed the ethics of what we were doing in the team and published a preliminary ethics position on the dataverse repository where we were sharing curated subsets of our data.⁵

Now, you will say that capturing contemporary data streams like Twitter is for the social sciences – we in the humanities don’t concern ourselves with the recent or social record – so we don’t need to worry about the ethics of such projects, but that isn’t true. While most digital humanities projects have to do with the datafication of historical materials and for that matter with only what is perceived to be of importance, more and more projects are asking questions about modern literature, language use, cultural consumption and popular culture, where the evidence has been captured about people who are still alive or whose children are alive. Whether it is historians using web archives, linguists studying contemporary usage or cultural studies projects looking at a culture war in the gaming community, the temptations of large and easy to capture datasets are the future of the digital humanities, if anything because it makes our methods relevant to current issues. It is therefore time to make sure we know why and when we should consider the ethics of what is datafied. And, as you will see later, even historical materials have ethical implications, which is why we believe that at the very least we should be asking the question with each new project, and asking it with a humility that is open to difficult answers.

To put it more forcefully, we are at a turn in how we think about knowledge. We can no longer naively treat the accumulation and datafication of information as a good thing just as after Hiroshima we began to question progress in science. More is not necessarily more. Faster is not necessarily better. Big data is not necessarily better. The same is true of scholarly digitization. There is no longer an archival function between digital humanists and the repository. We are now our own archivists because the infrastructure is designed for immediate depostion and engineered not to forget. We therefore all have to learn to think about the ethics of what is captured and digitized as archivists have. We have to learn from the “carework” disciplines that took care of knowledge - the archivists, the editors, and the librarians.

Information wants to be free

Information wants to be free. Information also wants to be expensive.⁶

At this point we will turn to asking directly about the ethical value of datafication. Is digitizing, enriching and providing access to information a good? As is often the case with ethical questions, they are so grand we are embarrassed to ask them. In this case the value of datafication is generally taken to be obvious. This presumption of value is typically communicated in the culture of information technology as the dictum that information should be free. Jon Katz in an article for Wired titled “The Birth of a Digital Nation” argued that “The single dominant ethic in this community is that information wants to be free.”⁷ Somehow the desire of information has become the ethic of an entire industry which to a certain extent includes the digital humanities, a rather alarming abdication of responsibility to a commonplace. What about our desires? Do we know what we want or want what we don’t know?

Further, the free software movement⁸ and later the open access movement have in different ways promoted free information as an unquestioned “good”. Our research granting councils promote the sharing of data as a way to “advance knowledge.”⁹ Librarians will argue that access to information is a principle of their profession and the reason public libraries should be funded. Cultural digitization projects are funded for the same reasons, to provide access to the information that should matter to us as a cultured community. Alas, like most principles, such ethical positions are rarely examined - they are a common starting point or principio that is so obvious that it shouldn’t need to be argued. To ground an ethics of digitizing information, however, we need to ask again whether it is so obvious. We need to take responsibility back from information even if we find ourselves agreeing in the end. We need some sense of why access to information is good and under what circumstances. Which is why we will now review some of the arguments for freedom of information and then return to that curious formulation by Stewart Brand about the desire of information.

Freedom of Information (FOI) as a right goes back to the 1946 UN General Assembly Resolution 59(1) “Calling of an International Conference on Freedom of Information” which starts by stating that “Freedom of information is a fundamental human right and is the touchstone of all the freedoms to which the United Nations is consecrated”. ¹⁰FOI is often justified in a political context as important to democracy where it is a right to information by and about government. Without accurate information about government and elected officials it is hard to hold them accountable and hard to vote in an informed way. FOI is therefore essential to the transparency and accountability that are characteristic of functioning democracies. The recent US election of 2016 would make a magnificent study of the truth of the role of free-range information, whether emails hacked off mail servers or backstage footage from news shows.

Free information is important not just to democratic functioning, it is also considered necessary though not sufficient for knowledge and skill. Without information you can’t learn things like Italian or how a computer works. Knowledge in turn is important to wisdom, which is hard to define, but many would agree that just knowing doesn’t mean you act ethically, which is what distinguishes wisdom. It would thus seem that datafication is a way to increase our supply of information, which should increase the stock of knowledge and eventually ethics and wisdom. Our knowledge infrastructure is frequently represented as a hierarchy or pyramid that makes the relationship between data (at the bottom) and wisdom (at the top) seem obvious. The wider the data foundation of the pyramid, the more you can get after refining and modelling at the top. Without data you don’t even have a ground for the higher ethical decisions we are talking about. The connection between data, or at least information, and ethics seems to be structural.

Needless to say there are problems with the hierarchy built data. Data may be necessary for information, but data is not sufficient or causal. More data, or for that matter more information, doesn’t make you more knowledgeable or wiser. A bigger library, better memory infrastructure, and more computation are not enough. This is one thing we humanists know, though that knowledge has been overshadowed by the opportunities of big data. As humanists we know that data needs to be curated and cared for if it is to become useful information. As humanists we know the importance of learning, skills, and modelling to knowledge. To paraphrase Gilbert Ryle (1945-1946), it is not enough to “know that” we also need to “know how.” More data, especially when our capacity to curate and learn from it is overwhelmed, actually leads to less information, distracting data and the sense of being overwhelmed.

Freedom of Expression (FOE) or freedom of speech is closely tied to FOI as without free expression there would be little public discourse to have free access to. FOE is also closely tied to democratic citizenship – without FOE we don’t have the free and open exchange of ideas, something important to the public dialogue which it is argued makes for a robust civic space. The freedom of speech along with freedom of religion and freedom of the press are in the 1st Amendment to the US Constitution that is part of the Bill of Rights. Article 19 of the Universal Declaration of Human Rights (UDHR) makes FOI explicitly a right stating,¹¹

Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.

Like FOI, FOE is about more than just the exchange of ideas for civil society, it is also about the freedom to make art. The freedom to express yourself is the freedom to be one who expresses themselves and the freedom to question and interpret the world. We take expression and interpretation to be fundamental to what it means to be human. We might include in this the freedom to express oneself through creating digital archives. Taking this perspective to its extreme, the FOE would therefore seem to be so fundamental that it trumps other rights when they are in conflict.

The rights of free expression and information do not, however, mean that all information wants to be free. It is people who are free to express themselves, not stuff like information. Nor does it mean that one can express oneself in whatever way one wants – there are still slander laws and other restrictions on expression like copyright. We recognize compromises where there are conflicting rights. We recognize that with FOE come responsibilities to use that freedom appropriately. When it comes to datafication, especially the digitization of the expression of others, this right assumes that those who express information actually want to share it widely, which in many cases they may not. This will come up later when we discuss appropriation of voice. FOI and FOE aren’t arguments that all information should be digitized and made freely available by anyone. They rightly place the agency with us to care about expression, not with the random information itself.

Access to Information is another formulation, and it is one of the principles of librarianship and archiving. The IFLA Code of Ethics for Librarians and Other Information Workers states in its preamble that,

The need to share ideas and information has grown more important with the increasing complexity of society in recent centuries and this provides a rationale for libraries and the practice of librarianship.

The role of information institutions and professionals, including libraries and librarians, in modern society is to support the optimisation of the recording and representation of information and to provide access to it.¹²

The Code goes on to connect this “belief” to the “recognition of information rights” which are then grounded in Article 19 of the UDHR. Note that this language of belief and role is in the preamble to the Code laying out the founding beliefs of the profession. Article 1 of the Code goes on to state:

The core mission of librarians and other information workers is to ensure access to information for all for personal development, education, cultural enrichment, leisure, economic activity and informed participation in and enhancement of democracy.

Librarians and other information workers reject the denial and restriction of access to information and ideas most particularly through censorship whether by states, governments, or religious or civil society institutions.

Later in Article 3 there is a discussion of privacy, but other than that there is no discussion of how the ethics of digitization and access might be more complicated. If anything, the phrasing of to “reject the denial and restriction of access” suggests a moral imperative to the profession to resist constraints on access. The code is thoroughly modernist in its commitment to access as a principle except when privacy is at issue. We should add that we don’t think the IFLA code fairly represents the practices of care around data that librarians actually take. It emphasizes the importance of librarianship to democratic FOI and FOE, but at the expense of the curatorial responsibilities they take.

Free Software and related ideas are probably the most influential concept regarding the freedom of information in computing culture, including the digital humanities. Many, after all, believe it is a best practice to release code and data under Creative Commons licenses, especially if funded by the public purse. But what does Free Software mean? “Free” is defined by the Free Software Foundation as

software that gives you the user the freedom to share, study and modify it. We call this free software because the user is free.¹³

The FSF goes to great pains to emphasize that “free” does not mean you don’t have to pay for software, but that you can do what you want with it (once you pay for it). This contrasts with the looser notion of “open source” software, which is usually code that is made available for free (you don’t have to pay) and which others can adapt, but which may have constraints on use. ¹⁴For our purposes what matters in the related ideas of free software and open source is the assumption that accessible and adaptable code and data are a good. Whether it is “free” or “open”, there is an admirable, if unquestioned, ethic of sharing as a good that is woven into the rhetoric of hacker culture. Take for example, Eric S. Raymond’s first directive for “How To Become a Hacker”:

Write open-source software

The first (the most central and most traditional) is to write programs that other hackers think are fun or useful, and give the program sources away to the whole hacker culture to use.¹⁵

If you want to be a hacker, which in this case is presented as the ideal role in computing culture, then you need to write code for your peers. This is an ethic of sharing that proposes a gift economy of code sharing. The hacker community imagined by Raymond is an ideal built on open sharing of a particular type of information - software code. This utopian community stands in contrast to the reality of commercial software where most code is not free or open in any sense. The other of free and open is the big bad multinational corporation. Who wants to be big and bad?

Like many utopian ideals the imagined hacker community is influential as an idea even if only partially realized. What happens less often among those committed to being hackers is a critical examination of the ethical ideal upon which the community is based. Access to free code is the idea upon which community is built, not around which it is negotiated.¹⁶Now we are even seeing a “sharing economy” with companies like Uber and AirBnB piggybacking on such rhetoric which in turn piggybacks on an ethical model that assumes that we are all empowered rational agents in a meritocracy of a gift economy; something that really isn’t true. This is the ethic of the powerful (white men), it doesn’t necessarily recognize those without equal access to virtuous circles, or the ability to code or without sources of income other than their code.

Now let us return to the curious formulation that “Information wants to be free.”

According to Roger Clarke’s web page on the phrase, this truism was coined by Stewart Brand in a discussion at the first Hackers Conference in the fall of 1984.¹⁷ It went on to be printed in different places including Brand’s book The Media Lab: Inventing the Future at MIT (1987). It is interesting that in the original formulation Brand contrasted the desire of information to be free with a balancing desire to be expensive, “because it’s so valuable.” Brand at least recognized the power of the market for information.

What is compelling in the formulation is the agency attributed to information - that it wants! What does it mean for some-thing to want? One reading of the desire is that there is something in the very form of information that communicates the desire for access. We could go further and say that when the given (data) is formed into information it is by design meant to gathered, distributed and accessed. That this is one meaning of the “form” in information. We could go further, pace McLuhan, and argue that the message of the form of information is promiscuous access. This is even more true of digital information where the digital form certainly makes it easy to copy, distribute and capture as if there were no material, temporal, or spatial constraints. Digital information is the culmination of thousands of years of human ingenuity aimed at designing information infrastructure so that information can flow ever faster, ever more accessibly, and in ways ever more resistant to censorship. Baked into this technological history is the utopian belief that if we could design information to want to be totally free of all of us and to therefore avoid all censorship, then we might be free despite ourselves.

This is a technological history with politics, even if ideal and questionable when put this way. But infrastructure doesn’t show its politics or desires – infrastructure is designed to be transparent and just used, which makes it capable of carrying rather simple desires. And it is the infrastructure built over time that bears the desire, and by infrastructure we mean the material infrastructure as well as the organizations, roles, and training. We in the humanities bear much of the responsibility for this design as we are the ones that benefit from the infrastructure of historical surveillance. It was after all the Italian humanists that defined humanism in reorganizing secular libraries and universities to study the human rather than the divine.

There is, however, a darker urgency to the desire of information that we need to confront, and that is the threat of technological determinism. There is a hint that we can no longer control the movement of information because the juggernaut of technology has been freed. The image is of the runaway train of technology that can no longer be stopped and now must just be trusted the way we are supposed to trust the invisible hand of the market. Providence has long been the crutch of those who don’t want to take responsibility. This theme of already determined outcomes often accompanies teleological predictions to the effect of “whatever we do it will all end well”. That happy ending could be a secular communist workers’ utopia or the kingdom of god. It doesn’t matter, sit back and trust the forces unleashed. The call of information tells us that we want freedom and it is inevitable anyway (so we might as well not fight it.) This rhetoric of inevitability is common in the world of information technology where inevitability is the dark companion of bright future hype (Seidensticker 2006). Get with the program or be flattened by whatever steamroller technology has just been announced!

Of course, there is nothing inevitable and information doesn’t really want anything; it is people who say so or critique the saying. This leads to what must be the most obvious ethical point about the ethics of datafication:

We have to take responsibility for the decisions we make about information.

We, the information workers, the scholars, the librarians, the archivists, the scientists are always choosing how to care about information, especially when we pretend it isn’t our job, or something is inevitable, or it is baked into the technology like a will to powerful access. The question is whether we can take responsibility for that care. We are already doing it, why not do it well?

Communities of Stories

There are, thankfully, other stories, also echoed in declarations of rights, that tell of a different relationship with information and that is what we want to talk of now. These stories question the very formulation of in-form-ation and remind us that we sing and speak for reasons other than distributing data. We will look at three approaches that call into question the passion for digitization starting with and moving beyond privacy.

Privacy. Privacy-related behaviour, in the sense of dynamic and contextual processes of drawing boundaries for social interaction and withdrawal, has existed throughout history and throughout cultures. Legally codified rights to privacy have been developing strongly since the late 1800s, partially in response to new media and technology that were increasingly perceived as intrusive. Privacy is increasingly seen as a core element for safeguarding personal autonomy - and as such, rights to privacy are as central to our idea of liberalism as the right to access of information and freedom of speech.

Privacy is not only, but in mediatized societies increasingly, about who has access to personal data and what they may do with it. Thus, privacy is a thorn in the flesh of “information freedom”: it ties information back to the person whose autonomy may be threatened when information about him or her flows freely, and it calls for limits to that “freedom”. Researchers from many disciplines (law, sociology, psychology, computer science, to name a few) have investigated privacy, stressing the importance of relationality and contextuality and elaborating on the manifold meanings of the term, and critical data scientists today contest the notion of any data being “objective” and “given” and therefore capable of “wanting” anything (Kitchin 2014).

Perspectives informed by feminism and other emancipatory approaches emphasize the ways in which the very definitions of what is private and what isn’t can express, cement and contest power relations. These approaches have contributed strongly to modern definitions of privacy that emphasize the freedom to develop one’s personality as the main goal, exercised through controlling rights over one’s personal data and hiding sensitive data (a traditional mainstay of the notion of privacy), but also through purposefully disclosing such data.¹⁸

In sum, thinking about privacy not only undermines naïve ideas about reified information wanting to be free, or naive ideas about privacy being only about confidentiality, but also challenges the subject-object relations of the scientific gaze of datafication projects. It challenges the digital humanist to ask about the rights of those whose records are being digitized, even if they are dead. The limitation of privacy (at least in most of its current legal framings) is that it focuses on the individual and doesn’t really question the modernist epistemology of information baked into technology.

Aboriginal Knowledge (AK) is an epistemology that rejects many of the enlightenment assumptions of Western epistemology. AK is not one thing and to try to define it for use in a paper like this is to reify it. Nonetheless, AK is in constant contact with Western research practices, and therefore various communities have developed statements of principle for discussion and for research encounters such as this.

Aboriginal people are concerned about the appropriate use and protection of their knowledge. Many deem integrationist research and implementation methods as another form of colonization and exploitation, where knowledge is categorized into hierarchies and AK (Aboriginal Knowledge) can be devalued, exposed, abused or used against Aboriginal empowerment to self-govern their resources.¹⁹

One aspect of AK is a challenge to our epistemological assumptions. Some of the assumptions challenged include:

The idea that knowledge is guaranteed through open sharing and testing. By contrast, aboriginal knowledge is guaranteed through traditions of, for example, storytelling.
That knowledge can be formed into discrete truths that can be tested independently.
That knowledge should be tested, often in adversarial situations.
That knowledge can be owned by a single person rather than by a community over time.
That consent can be given by a single person and not by a community over time.

The point is that traditions of telling view the open access archive skeptically. The open archive reduces the relationship of telling stories to a situation where all stories are information available to all, whether or not they are ready. By contrast in many communities stories are told at the right moment to the right person. Stories are not owned by individuals, but belong to communities and are passed within the community. What right have we to digitize these stories and store them up outside their context of telling? Any digital humanities project digitizing the stories of a community should think about how to ask whether we have the right to datafy and how we would know. We don’t want to find ourselves keeping bone libraries of pillaged graves just because we thought measuring skulls would shed light on the human condition. What Keavy Martin says about literary studies applies to the digital humanities.

How can Indigenous literary studies take seriously Indigenous knowledge, “traditional” or otherwise? How do our methods—our ways of thinking about and reading texts—converse with Indigenous traditions and contemporary concerns? (Martin 2012, Loc. 163-165)

Ignorance studies in one sense generalizes some of these ideas (for an overview, see Gross and McGoey 2015). At its core, ignorance studies defies both the epistemological and the normative views of the West that “knowledge societies” keep accumulating knowledge and reducing knowledge gaps, and that more knowledge is a good thing. On the contrary, they show – with examples from a wide range of disciplines – how unavoidable, emergent, and constitutive is ignorance (or non-knowledge). While doing this, ignorance studies also highlights how knowledge and ignorance are guaranteed through traditions of telling deeply embedded in communities. Other branches, such as those relating to bio-ethics, highlight the ethical value of specific unknowns and of a right to not know (e.g. Laurie 2014).

Appropriation of Voice (AoV) is a more general critique of the assumptions behind projects that speak for others. Alcoff (1991-2) argues that we all have a social location, especially those of us who are academics. This location provides epistemic authority to our voice, authority that should be questioned, but rarely is.

As philosophers and social theorists we are authorized by virtue of our academic positions to develop theories that express and encompass the ideas, needs, and goals of others. However, we must begin to ask ourselves whether this is a legitimate authority. (Alcoff 1991-2, 7)

Applying this critique to datafication, we are called to ask whether datafication can become a form of appropriation by those in control (us) of the information of those without access to the infrastructure. By what authority do we digitize the culture of others, whether other contemporary cultures or those of the past? For that matter what right do we have to create a Gamergate archive if I self-identify as being critical of that community? Creating an archive preserves the information as we structure it which is not a neutral activity despite all the best practices. Asking about appropriation means being rather more honest with ourselves about our epistemic authority and its biases. That we are professors or researchers doesn’t magically authorize us to scrape and archive anything. It imposes a greater burden of care on us to make sure we are respectful of what we gather.

Importantly, we need to recognize that the act of digitizing and mounting digital archives can silence others who have some claim to the culture digitized. It also appropriates works from their original cultural context and renders them in a digital context where the message is that of the technological medium and its culture. This new digital context may be efficient, but it typically has no continuity with the traditions of the other – there is no trace back. There is, as Benjamin put it, no “aura” to the work. (Benjamin 1969)

We can adapt Alcoff to identify three dangers of appropriation.

First, that the corpus we want to digitize represents the voice of others who are often not involved in the digitization and therefore have their culture represented for them without their consent or the benefit of their perspective. One might answer that we academics can transcend our location and empathize with the other, but there is a danger that we actually can’t or don’t bother. Either way, the onus is on us to reflect on whether we really can empathize or if we aren’t just feathering our own research nest with the stories of others.
Further, any attempt by those whose stories are digitized to speak for themselves can be overwhelmed by the existence of the already well supported digital archive infrastructure so they are in effect silenced. As Alcoff points out, this is why we have created new academic disciplines like Women’s Studies where the oppressed can speak for themselves.
Third, that our location as digital humanists is inherently dangerous, both because of the power and authority conveyed on us, the biases built into the academy, the cultural biases of the humanities, and those of digital technologies.

In all fairness, Alcoff is well aware of the problems of delimiting identity and membership in groups such that you can or cannot talk for them. There is no formula such that you can determine what you can talk about without speaking for another group. Further, in some situations we expect privileged people to talk for those without a voice and to use their position to draw attention to the other.

What matters is that we start to care for the voice of the other, ask about it, listen to it, let it speak for itself in the form it knows, and not just digitize it. We need to bake into our practices the ongoing and iterative reflection that avoids confronting the issue at the end when the project has taken on its own agency and now wants to be free.

Carefully approaching an ethics of digitization

Merely consider the hypothesis of a Christian Europe, convinced of its legitimacy, rallied together in its reconstituted universality, having once again, therefore, transformed its forces into a “universal” value-triangulated with the technological strength of the United States and the financial sovereignty of Japan–and you will have some notion of the silence and indifference that for the next fifty years (if it is possible thus to estimate) surround the problems, the dependencies and the chaotic sufferings of the countries of the south with nothingness. (Glissant 1997, 191)

Inevitably at this point in a digital humanities paper, we are supposed to present the solution that lets us all get back to work. The subtitle to a recent book by Morozov (2013), To Save Everything, Click Here: The Folly of Technological Solutionism, is enough to warn us against the temptation to assume that there is a solution, even if that is what is called for in this, the third act of a paper.

No, the problem is deeply rooted in our history of managing information and the infrastructure we have built up. The problems of the formation of what Bowker (2008) calls memory infrastructure cannot be solved, for example, by a few gestures like proposing a code of conduct to take into account a few differences in perspective, though a code might be a good place to start. Part of the problem is that we assume the Western way of conceiving information framed by our infrastructure is universal or at least the best, and therefore the bottle into which all wines should be poured. What would it mean to take seriously that some stories shouldn’t be told, or should be told, but not digitized, even when we have access? What would it mean to take seriously that some people and some communities would demand to be hidden from datafication or as Glissant puts it “demand […] the right to opacity.” (Glissant 1997, 189)

What is needed is not a solution, but a praxis and one that respects differences, both the differences between those whose testimony is now treated as information and those using the testimony, and the differences of ways of telling and their attendant infrastructures. Here we return to Presner’s challenge to think about the ethics of the algorithmic infrastructure. He grounds his ethics in the dialogical relationship between user and those who offered their information as testimony.

With reference to the concrete example of the Visual History Archive and its testimonies, he emphasizes how this dialogical relationship is already now supported by the interactive searching and linking functionalities of the infrastructure. However, as Presner also points out, the linguistic surface structure and in this sense also the semantics of these searches are determined – and therefore limited – by the metadata that have been assigned by the Archive’s expert annotators. Presner argues that by opening the metadata annotation also to searching users, the dialogue opportunities would be enhanced and the dialogical relationships deepened. At first sight, his proposal evokes collaborative tagging platforms, which usually have less ethically charged contents such as metadata for bibliographies. In the context of materials such as those of the Visual History Archive (testimonies of genocide) or many other digital humanities projects, such an approach would face new ethical challenges and require extensive moderation – spam and hate speech come into their own when the material deals with vulnerable people or difficult subjects. At the same time, an extension to user-driven and collaborative annotation could open new horizons for dialogue.

As for information, there is a potential for a relationship of respect in formation. This hints at a way of thinking through the ethics of datafication as a relationship of care for the other and their information that draws on the ethics of care (Held 2006). We might even say that this way suggests another sense of information wanting to be free and that is the freedom of information to be reinterpreted by its listeners in order to (re)establish relationships with those whose stories have been digitized. The freedom desired is that of testimony – of the witness to have a relationship with the future that is respectful. It is the freedom from some constraints of the infrastructure or the assumptions of certain users. (In an ideal world, it would be a freedom from all constraints of the infrastructure and the assumptions of all users, but this is not possible, code is law (Lessig, 1999).) It is freedom to communicate an obligation to be taken seriously. It is such freedom that we digital humanists should try to give space for in the design of digital archives. But such freedom is not just a philosophical aspiration. We believe that by looking at how ethics play out in projects we can outline what care would look like in this context of datafication.

In essence, video testimony—in so far as it instantiates a relationship of intersubjective relationality through the ich-du [I-Thou] pact between the survivor and the listener—becomes a practice of ethics as a relation of obligation and responsibility to the other. (Presner 2016, 183)

References

Agre, P. E. (April-June 1994). “Surveillance and Capture: Two Models of Privacy.” Information Society. 10:2. 101-127.

Alcoff, L. (1991-1992). “The Problem of Speaking for Others.” Cultural Critique**.** 20: Winter. 5-32.

Ball, K., et al., Eds. (2012). Routledge Handbook of Surveillance Studies. London, Routledge.

Ball, J. (Jan. 16, 2014). “NSA collects millions of text messages daily in ‘untargeted’ global sweep.” The Guardian. Online at https://www.theguardian.com/world/2014/jan/16/nsa-collects-millions-text-messages-daily-untargeted-global-sweep

Bamford, J. (2009). The Shadow Factory: The Ultra-secret NSA from 9/11 to the Eavesdropping on America. New York, Anchor Books.

Benjamin, W. (1969). The Work of Art in the Age of Mechanical Reproduction. Illuminations. Ed. H. Arendt. New York, Schocken Books**.** 217-251.

Berendt, B.; Büchler, M.; and G. Rockwell. (2015) “Is it Research or is it Spying? Thinking-Through Ethics in Big Data AI and Other Knowledge Sciences.” Künstliche Intelligenz (German Journal of Artificial Intelligence.) Published online in March, 2015. http://dx.doi.org/10.1007/s13218-015-0355-2

Bowker, G. C. (2008). Memory Practices in the Sciences. Cambridge, Massachusetts, MIT Press.

Brand, S. (1987). The Media Lab; Inventing the Future at MIT. New York, Viking.

Drucker, J. (2011). Humanities Approaches to Graphical Display. DHQ. 5:1. http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html

Foucault, M. (1977). Discipline and Punish: The Birth of the Prison. Trans. A. Sheridan. London, Penguin.

Glissant, É. (1997). “For Opacity.” Poetics of Relation, University of Michigan Press. 189-194.

Greenwald, G. (June 6, 2013). “NSA collecting phone records of millions of Verizon customers daily.” The Guardian. Online at .

Greenwald, G. (2014). No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State. Kindle Edition. Canada, Signal.

Gross, M. and McGoey, L. (Eds.) (2015). Routledge International Handbook of Ignorance Studies. Abingdon-on-Thames, UK: Routledge.

Gürses, S. (2010). Multilateral Privacy Requirements Analysis in Online Social Network Services. PhD thesis. KU Leuven, Dept. of Computer Science.

Held, V. (2006). The Ethics of Care: Personal, Political, and Global. Oxford, Oxford University Press.

Hobbs, R. (2010). Digital and Media Literacy: A Plan of Action. Report of the Aspen Institute, Washington, D.C. Online at http://www.knightcomm.org/digital-and-media-literacy/

Jockers, M. L. and J. Flanders (March 18, 2013). “A Matter of Scale.” Script of a staged debate for the Boston Area Days of Digital Humanities Conference. Northeastern University.

Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data Infrastructures & Their Consequences. London: Sage.

Laurie, G. (2014). Recognizing the right not to know: Conceptual, professional, and legal implications. The Journal of Law, Medicine & Ethics, 42(1), 53-63.

Lessig, L. (1999). Code and Other Laws of Cyberspace. New York: Basic Books.

Martin, K. (2012). Stories in a New Skin: Approaches to Inuit Literature. Kindle Edition. Winnipeg, Manitoba: University of Manitoba Press.

Morozov, E. (2013). To Save Everything, Click Here: The Folly of Technological Solutionism. New York, PublicAffairs.

Nagle, A. (2017). Kill All Normies: The online culture wars from Tumblr and 4chan to the alt-right and Trump. Winchester, UK, Zero Books.

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequity and Threatens Democracy. Kindle Edition. New York, Crown.

Podesta, J., et al. (2014). Big Data: Seizing Opportunities, Preserving Values. Executive Office of the President, White House, Washington. https://www.whitehouse.gov/sites/…/big_data_privacy_report_may_1_2014.pdf

Presner, T. (2016) “The Ethics of the Algorithm: Close and Distant Listening to the Shoah Foundation Visual History Archive.” Probing the Ethics of Holocaust Culture. Eds. Fogu, Kansteiner, and Presner. Kindle Edition. Cambridge: Harvard University Press. 175-202.

Raymond, E. (1999). The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. Sebastopol, CA, O’Reilly & Associates.

Rehbein, M. (2015) “On Ethical Issues in the Digital Humanities”. Preprint of an essay that was presented at the Digital Humanities Summer Institute, Victoria, BC in July of 2015. PDF online at http://www.phil.uni-passau.de/…/OnEthicalIssues-Preprint.pdf

Rockwell, G. and S. Sinclair (2016). Hermeneutica: Computer-Assisted Interpretation in the Humanities. Cambridge, Massachusetts, MIT Press.

Ryle, G. (1945-1946). “Knowing How and Knowing That: The Presidential Address.” Proceedings of the Aristotelian Society. 46, 1-16.

Seidensticker, B. (2006). Future Hype: The Myths of Technology Change. San Francisco, Berrett-Koehler Publishers.

Wikipedia contributors (2017). “Gamergate controversy.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, Jul. 21, 2017. Web. Accessed Jul. 24, 2017.

For a good collection on surveillance see Ball et al. (2012). In their introduction the editors argue that surveillance “has emerged as the dominant organizing practice of late modernity.” (p. 1) For a readable discussion of post 9/11 NSA eavesdropping see Bamford (2009). ↩
The first news story reporting on what Snowden leaked as Greenwald’s story in The Guardian from June 6th, 2013 on “NSA collecting phone records of millions of Verizon customers daily.” Greenwald later wrote a book about his role and the significance of the Snowden materials (Greenwald, 2014). ↩
See, for example, O’Neil (2016) on the math of big data or Rockwell and Sinclair (2016) on text analysis and text mining. ↩
For more on Gamergate see Wikipedia contributors (2017) or, for a broader perspective on Gamergate and related phenomena, see Nagle (2017). ↩
See “Gamergate Reactions” at http://dx.doi.org/10.7939/DVN/10253. Partly as a result of these reflections we are revising the ethics position in dialogue among the larger team. ↩
Stewart Brand is generally credited with coining the phrase “Information wants to be free.” This longer formulation is from his book The Media Lab. See Roger Clarke’s informative web page on the phrase at http://www.rogerclarke.com/II/IWtbF.html. ↩
http://archive.wired.com/wired/archive/5.04/netizen_pr.html ↩
http://www.gnu.org/philosophy/free-sw.en.html ↩
http://www.science.gc.ca/default.asp?lang=En&n=83F7624E-1 ↩
See https://documents-dds-ny.un.org/doc/RESOLUTION/GEN/NR0/033/10/IMG/NR003310.pdf. The second and next paragraph goes on to say that “Freedom of information implies the right to gather, transmit and publish news anywhere and everywhere without fetters.” ↩
This article of the UDHR is recognized as a fundamental right for example in the European Union through the European Convention on Human Rights and the Charter of Fundamental Rights of the European Union. ↩
The IFLA Code of Ethics can be found at http://www.ifla.org/news/ifla-code-of-ethics-for-librarians-and-other-information-workers-full-version ↩
Free Software Foundation page “What is free software?” https://www.fsf.org/about/what-is-free-software. There is also a related page on the “Philosophy of the GNU Project” at http://www.gnu.org/philosophy/philosophy.en.html. ↩
Eric S. Raymond has an influential essay that was then turned into a book titled “The Cathedral and the Bazaar”. See Raymond (1999) or http://www.catb.org/esr/writings/cathedral-bazaar ↩
Emphasis is in the original at http://www.catb.org/~esr/faqs/hacker-howto.html#respect1. The second directive is to “Help test and debug open-source software”. ↩
We note that there is a tradition of arguing against hacking as piracy starting with Bill Gates’ “An Open Letter To Hobbyists”, http://www.digibarn.com/collections/newsletters/homebrew/V2_01/gatesletter.html. We also note that a community built on the ability to write code is an exclusive one with little diversity. Sharing code becomes a code for being one of us that is cloaked in pseudo-open meritocratic talk. ↩
See http://www.rogerclarke.com/II/IWtbF.html. ↩
For details and sources, see Gürses 2010. ↩
Assembly of First Nations, First Nations Ethics First Nations Ethics Guide on Research and Aboriginal Traditional Knowledge, p. 3. http://www.afn.ca/uploads/files/fn_ethics_guide_on_research_and_atk.pdf. ↩

Rockwell, Geoffrey and Bettina Berendt. "Information Wants to Be Free, Or Does It?: The Ethics of Datafication" Electronic Book Review, 3 December 2017, https://doi.org/10.7273/z1wx-w503

Information Wants to Be Free, Or Does It?: The Ethics of Datafication

Introduction

Why look at ethics now?

Information wants to be free

Communities of Stories

Carefully approaching an ethics of digitization

References

Cite this article

Works Cited

Information Wants to Be Free, Or Does It?: The Ethics of Datafication

Introduction

Why look at ethics now?

Information wants to be free

Communities of Stories

Carefully approaching an ethics of digitization

References

Footnotes

Cite this article

Works Cited