A man walks past a Facebook logo at Facebook's headquarters at Rathbone Place in London. (Reuters/Dominic Lipinski)
A man walks past a Facebook logo at Facebook's headquarters at Rathbone Place in London. (Reuters/Dominic Lipinski)

Recently, a number of prominent Twitter users posted screenshots that pointed out their repeated and unsuccessful efforts to share news articles about the COVID-19 crisis on Facebook. Canada’s National Observer reported that articles from major news outlets, as well as from some Canadian government websites, were being rendered invisible, leading to widespread frustration as people sought to share tips about staying safe with their friends and family.

While this wasn’t close to being the first or last time that digitally-savvy folks flocked from one platform to another to bring visibility to content decisions they were unhappy with, what was unique was the context. In mid-March, as the outbreak of COVID-19 overwhelmed health care systems and brought chaos to the market, the pandemic also started to disrupt the global network of contractors that major social media companies rely on to screen the content shared by their billions of users. As more and more cities around the world began recommending or requiring self-isolation and social distancing, YouTube, Twitter and Facebook announced that many of their moderators were being sent home, and that they would be ratcheting up their use of automated systems for flagging and screening content in response.

Academic observers were quick to put two and two together, and suggest that the mistaken takedowns were a product of this new all-artificial intelligence (AI) moderation system going horribly wrong. But the Facebook executive in charge of developing the company’s automated content-detection systems quickly tweeted that the issue was actually with some of their anti-spam classifiers, and was “unrelated to any changes in our content moderator workforce.” Crisis averted, right?

Not quite. In the past few years, major social media companies — driven by the problem of scale, as well as by lawmakers trying desperately to get firms to police content better and faster — have significantly increased the number of automated systems that they use to detect, flag and act upon user-generated content. More automated tools are constantly being deployed across virtually every content area and every platform; they now enforce rules around nudity and sexual content, terrorism and violence, and hate speech and bullying, on Facebook, Instagram, Twitter, YouTube and, in some narrow contexts, WhatsApp. However, as last week’s events demonstrate, a massive amount of uncertainty and ambiguity surround how these systems actually work, even among the academics and social media researchers who are these platforms’ keenest observers.

The situation has some parallels to how, only four or five years ago, very little was really known about the way commercial content moderation, conducted by human moderators around the world, generally worked. The companies were extremely cagey about who did this work, and how; the policies or “guidelines” that set the rules of acceptable expression were not public; and the political and economic dynamics of the global network of labour that were playing an increasingly important, if hidden, role in online life, were still not well known. While the hard work of a few motivated researchers has gone a long way to changing that, our knowledge of these increasingly consequential automated systems appears now to be the next major frontier.

What We Know about Automated Decision Making in Content Moderation

A few weeks ago, Reuben Binns, Christian Katzenbach and I published an academic article that assesses the publicly available documentation on the automated content-moderation systems deployed by major social media platforms. Our goal was to parse some of the jargon of computer science to create a more accessible primer on how automated content moderation works in practice. A few insights are relevant to content moderation amid COVID-19.

First, when we talk about the use of automation or “AI” in content moderation, we should probably be making a distinction between two broad types of systems: those that use databases to match “fingerprints” of new uploads against a known set of banned content, and those that try to make predictions, based on previous examples, against new, unique types of content.

While the details of this can be quite technical, suffice it to say that the former system, called a hash database, is only currently deployed in two areas: child abuse imagery and terrorist material.

The second type, which uses statistical techniques and machine learning, works differently: code is “trained” on a large corpus of the type of content it will try to detect, such as hate speech, and then it uses those signifiers to try to predict whether new, unseen forms of content have similar patterns. In this process, the system extracts various proxies for “hate speechiness” in the original training data, develops internal rules based on those proxies, and then uses them to try to classify new types of content, predicting their likelihood of being hate speech.

Second, these two types of systems work differently, and have different human roles. Hash databases rely on trusted human moderators to upload the initial pieces of content to the database, but then work automatically, with every single type of content posted on today’s major social networks — audio, video, photos or text — fingerprinted and matched to these databases at the point of upload. While there are important transparency and accountability questions about the process of adding content to these types of systems (if, for example, something inadvertently made it into one of these databases, it might never be found, as it is not clear how or if they are audited), their technical accuracy is generally not disputed.

The predictive systems are more problematic from a technical standpoint. They’re basically just making guesses based upon what they know, and what they know is often not great: multiple studies have documented the existence of biases in commonly used hate-speech-training data sets, showing, for example, that tweets by self-identified African Americans are up to two times more likely to be labelled as offensive compared to others. For this reason, platforms, for the most part, only use these types of automated classifiers to flag content for further review by a human moderator. This means that they don’t make takedown decisions all on their own — although they may make a variety of softer, yet still impactful, decisions, such as whether a video on YouTube should have its ad revenue automatically redirected to a copyright claimant.

In our survey of the publicly available documentation of these systems in use at Facebook, we found only two exceptions to the general “no takedown without a human in the loop” rule of thumb. The first: a specific set of Islamic State and al-Qaeda content, where the company clearly believes that any over-blocking concerns are outweighed by the content’s threat to the public (and also by their commitments to policy makers in Europe and North America). The second: the spam classifiers that appeared to misfire last week.

This reticence to deploy fully automated takedowns suggests that platforms recognize these systems’ current technical limitations, and reflects as well the very real over-blocking concerns and the important role that humans still play in confirming the takedown of flagged content and cutting back on false positives. We’re still very far from the all-AI moderation landscape that folks have been worried about; rather, we might expect the platforms to increase the amount of fully automated takedowns in a small number of the most problematic areas (for example, in the terrorism category) and impose buffer measures (such as making some types of automatically flagged content invisible until a reviewer has time to look at it) as they try and make do with a smaller workforce that can still work, or work from home.

While it’s still too early to know what the long-term effects the global outbreak of COVID-19 will have on global platform moderation, it is increasingly evident that platform companies are hoping that automated systems will eventually relieve their reliance on human labour. Notably, Mark Zuckerberg invoked AI as the solution to Facebook’s political problems multiple times during Congressional testimony in the past two years, and this pandemic could be an opportunity to see how its users react to a step in that direction. 

Events like those of last week, even if merely the fault of a technical glitch with a spam classifier, provide a valuable look at the oft-forgotten infrastructures that undergird how we conduct our lives online. As we briefly noted in our article on algorithmic content moderation, spam filters are the perfect example of a system that we don’t generally think about — except, perhaps, when they go wrong. We should be cautious of similar efforts to make automated systems for content moderation blur into the background, operating at a level that produces convenience for the user but renders the underlying social questions invisible.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.
  • Robert Gorwa is a CIGI fellow. He focuses on the politics of platform governance, the regulation of large technology multinationals, and the theory and practice of contemporary content moderation.