It was winter 2010. I was a Ph.D. student in the middle of researching my dissertation. I was shivering over documents in a cold backroom of a storage facility in East London. Every hour on the hour, I picked up the landline and rang the archivist to tell him what I was doing. Often he did not answer, so I would leave a message. But I assiduously rang every hour. If I didn’t, perhaps the archivist wouldn’t let me back in the next day.
Sitting in the freezing storage facility was a privilege. After a day of vetting my request in the Thomson Reuters headquarters at Canary Wharf, the archivist had allowed me to access and read archival documents. I had convinced him that my topic was worth studying and that I could be trusted to use the archive responsibly without misplacing documents. Unlike public national archives, the Reuters archive was not obliged to grant me access. The major global news organization could have turfed me out of its archival storage facility whenever it wanted.
All this may seem esoteric, applicable only to historians who love poring over old, crumbling pieces of paper. But the problem of corporate archives and access remains deeply relevant. Just recently, researchers from New York University were removed from Facebook for using a browser extension to research the company’s advertisements. Facebook’s seemingly arbitrary action spurred a wave of commentary and outcry, including a rebuke from the Federal Trade Commission. While the case attracted far more attention than would that of a historian denied access to a corporate archive, it raises similar questions. How can we think about researcher access in the digital age, and what lessons can we learn from historians’ travails around archival access?
The first and most obvious lesson is that corporations are under no obligation to make their archives accessible, unless legislation compels them. The reach of social media platforms makes the problem feel politically and socially urgent. But it is important to remember that such corporate transparency is not necessarily the norm for media companies. The Associated Press has existed in various forms since 1846 but it only established corporate archives in 2003. Only a few Silicon Valley companies like Cisco Systems and Hewlett-Packard have established archives. Luckily, governments and platforms can do a great deal to improve transparency and access. But we should not be naive about how long this could take.
Second, archives need infrastructures and financial backing to function. Scholars have seen even major archives run into such problems. Reuters allowed only one researcher to visit at a time; I had just a few days to rifle through thousands of documents. Historian Matthew Connelly has raised the alarm about the National Archives in the United States, whose classification and archival practices may lead to millions of documents being destroyed or deleted. Inadequate funding also means that it takes many years to declassify documents or produce primary source collections. Any legislation around data access for social media companies should ensure that similar problems do not emerge. Even when social media companies have set up ad libraries or archives, researchers have complained they are barely usable. This is why the draft of the European Digital Services Act lays out much more specific guidelines for how platforms should provide data to researchers.
Third, researchers should have a say in what data matters. Back in the (now olden) days of paper documents, archives could only hold so much paper. Archivists decided which documents to retain. My heart would always sink when I ordered a file, only to see it had been destroyed. It was pertinent for me, but someone had decided decades ago that the file was irrelevant. Sometimes this has happened for problematic reasons: an official review in 2012 found that the British Foreign Office destroyed or hid many documents in the 1950s and 1960s that documented colonial crimes. As we consider transparency in the digital age, we should think about how to prevent these problems by giving researchers a say in creating access frameworks. In the case of social media platforms, talking to researchers makes clear that data can also be qualitative. Mike Ananny, for example, uncovered really valuable information about fact-checking partnerships with Facebook through interviews.
Fourth, we should think now about storage for future scholarship. How will we ensure access to data for scholars in 20 or 30 years’ time when file formats may have changed dramatically? Historians are now confronting these questions with born-digital sources, even from the 1990s. So, too, should we be careful not to design access, archive and transparency plans around the companies that exist today. Facebook is not the only social media company. What would researcher access look like for, say, TikTok, or even for virtual reality companies, in the future?
Finally, we do not need to reinvent the wheel around data privacy. Scholars have long been granted access to incredibly sensitive information. I have had personnel files declassified in multiple archives. Other scholars work with health data or tax statistics. We can apply some of these frameworks to social media platforms with relative rapidity.
The problems around archives are old. Back in the mid-1990s, French postmodern theorist Jacques Derrida wrote that “there is no political power without control of the archive, if not of memory. Effective democratization can always be measured by this essential criterion: the participation in and the access to the archive, its constitution, and its interpretation.”
Many scholars would eagerly shiver in the cold backroom to do their research and answer some of the most pressing questions about social media. It’s up to governments and platforms to enable them to do so. The power of democracy is at stake.