Algorithmic Content Moderation Brings New Opportunities and Risks

As AI rapidly advances, it’s worth asking how automated moderation could look in a few years.

October 23, 2023
The new logo of Elon Musk’s social media platform X, formerly known as Twitter, is displayed in this photo illustration. (Dado Ruvic/REUTERS)

Content moderation — the process of monitoring and removing or otherwise suppressing content deemed harmful or undesirable — is now widely recognized as essential for platforms hosting user-generated content. Given its importance in regulating online discussions and media, it has also become an increasingly fraught subject of political debate.

Although the taxing, poorly paid work of reviewing content remains essential, large-scale platforms now also increasingly moderate content automatically. Automated moderation has commonly involved tools that simply replicate human moderators’ decisions, by identifying other posts with the same content. But platforms are also increasingly using artificial intelligence (AI) tools to monitor content proactively. Historically, such tools have had very limited abilities to understand context, nuance and implicit meanings, or to analyze mixed-media content such as videos.

However, at a time of intense “AI hype” and rapid advances in the development and commercialization of AI, it’s worth asking how automated moderation could look in a few years. Shifting technological conditions that change how online speech can be regulated also raise complex political questions about how it should be.

Surveying recent industry developments, I see three trends that seem to be emerging: more effective and context-sensitive policy enforcement; more use of automation to reconfigure (not replace) moderation work; and entirely new contexts, where content moderation could become important. All three trends ultimately suggest intensified state and corporate control over online media, and could undermine countervailing power from workers and other stakeholders. They call for close attention to when and how automated moderation can be used legitimately, and what safeguards are necessary.

Moderation Will Get Better (but for Whom?)

We are seeing rapid advances in commercial applications of large language models (LLMs). LLMs are currently best known through text-generation applications such as OpenAI’s ChatGPT, but the underlying models have many use cases, including content moderation. Trained on vast corpora of text data, LLMs can effectively be fine-tuned for specialist tasks — such as enforcing particular content policies — with relatively little additional data. Multimodal LLMs, which can analyze visual and mixed-media content, are also improving rapidly.

Big tech companies such as Meta reportedly already use LLMs for moderation, and OpenAI — which has a close partnership with Microsoft — advertises moderation as a use case for its latest model, GPT-4 (which is commercially available to platforms of all sizes). For now, the use of LLMs at scale is limited by cost, since they require enormous computing power. However, we can expect LLMs to become more accessible and cost-effective over time.

What will this mean in practice? First, even if moderation remains very imperfect, we shouldn’t understate the value of improved accuracy and reliability (although they should be weighed against the costs of deploying AI, such as environmental impacts). Reliably moderating harassment and hate speech is essential to create safe, inclusive online spaces. And errors in moderation tend to disproportionately affect marginalized people. For example, current tools seem to rely heavily on flagging keywords — which can miss problematic content that doesn’t use overtly aggressive phrasing, while often censoring marginalized communities who use reclaimed slurs. LLMs could do a better job of identifying, for example, that a drag queen using the word bitch is not being aggressive, whereas someone using superficially polite language to espouse far-right views is.

However, even the capacities of the best LLMs shouldn’t be overestimated. They are nowhere close to achieving human-level comprehension. Even if they were, there will never be an objectively correct answer to how platforms’ policies apply in every case: what is “hateful” or “inappropriate for children” will always be, to some extent, indeterminate and contestable. Perfectly accurate moderation is not only technically out of reach but intrinsically impossible.

Moreover, LLMs’ capabilities remain extremely limited outside English and a few other languages most represented in the training models. Greater reliance on LLMs could thus widen the “trust and safety” gap between wealthy North American and western European markets, which platforms already prioritize (because those customers are more valuable advertising targets), and the rest of the world. We also know that biased and stereotypical representations of marginalized groups are rife in the data sets on which today’s cutting-edge AI models are based, and that their outputs reflect this.

In any case, discrimination doesn’t just result from technical limitations and so-called “glitches.” Moderation policies and software are shaped by the structurally unequal contexts in which they’re designed, from the unrepresentative demographics of tech executives and developers, to the business goals the policies are built to serve. Improving technical accuracy won’t change that advertiser-funded platforms are incentivized to moderate content that offends the “respectability politics” favoured by corporate advertisers, rather than content that harms vulnerable users (think back to the drag queen example: some people do actually think drag is offensive, and platforms and advertisers may wish to cater to that audience). As Jennifer Cobbe presciently argued in 2020, making platforms safer through improved algorithmic monitoring also means more precise and inescapable corporate control.

Algorithmic moderation isn’t just developed to implement pre-existing policies. The tools available shape which policies and enforcement practices are possible.

Automation Will Empower Governments and Disempower Workers

We’ve already seen a “function creep” whereby algorithmic tools, once developed, tend to be used in more and more areas. For example, hash-matching technologies originally developed for serious illegal content, such as child sexual abuse material, are now used to efficiently scale up enforcement of all kinds of platform policies. Given technical advances and efficiency advantages, it’s likely AI moderation will increasingly be deployed for tasks previously considered to require human assessment, even if the software can’t equal human performance.

A benefit of automation touted by industry actors is reducing the burden on human moderators, whose work is often highly distressing — to the point that it can cause mental health problems — and accompanied by appalling pay and working conditions. Yet building and maintaining today’s cutting-edge AI models also requires extensive, ongoing human labour to produce training data and evaluate outputs — which, like content moderation, is typically outsourced to poorly paid workers in the Global South. It’s interesting to observe that while major platforms historically externalized moderation labour through global supply chains to cut costs, these employment practices have recently been facing increasing pushback through unionization, lawsuits and negative press. In this context, then, an intensified “turn to AI” can be understood not as replacing moderators, but as further reducing their leverage by introducing another layer of externalization between their labour and the end product.

Moreover, algorithmic moderation isn’t just developed to implement pre-existing policies. The tools available shape which policies and enforcement practices are possible. For example, if all that’s required to change a moderation rule is a software update, rather than coordination with large human workforces, moderation policies could become much more dynamic. This agility could bring advantages — for example, in responding to crises or strategic disinformation operations — but raises questions about fairness and consistency, as well as accountability: how can regulators and civil society effectively oversee platforms if their rules change constantly?

Regulatory pressure will also continue to incentivize automation. In legislation such as the European Union’s Digital Services Act and regulation to address the dissemination of terrorist content online, the UK Online Safety Bill and the United States’ proposed Kids Online Safety Act, platform companies have been assigned broadly defined duties to proactively mitigate various harms or risks — with the details mostly left up to them. Expanding automated moderation will likely be their primary response, as it’s not only (relatively) cheap and scalable but also legally attractive.

Since AI tools are already widely used, and the underlying technologies are improving, companies can argue they represent industry-standard best practices. Where they face legal risks for discriminatory or arbitrary moderation (as, for example, under the Digital Services Act), using outsourced software could also effectively externalize accountability. Identifying discrimination and allocating responsibility in complex, opaque AI supply chains is very difficult. Platforms could simply claim to be using the best available software, with design choices that could lead to discriminatory outcomes, externalized onto other companies further down the supply chain.

Obviously, state-mandated content filtering is open to political abuse, and will also often cause collateral damage for marginalized communities and political perspectives, even where they aren’t deliberately targeted. For example, due to algorithmic bias, poor capacities in “low-resource languages,” and histories of Islamophobia in counterterrorism institutions, intensifying automated moderation of “terrorist content” will disproportionately increase censorship of certain user groups, such as Arabic speakers and pro-Palestinian activists.

Importantly, censorship is by no means only a threat in authoritarian countries or “illiberal democracies” where media freedom is generally under attack. Western European democracies have shown themselves very willing to criminalize peaceful climate activism, for example, or to pressure platforms to take down legal content related to COVID-19 policy or anti-police violence protests. Requiring the expansion of AI systems for comprehensive, real-time monitoring and filtering of online content creates evident risks that they will be used to suppress political dissent.

New Types of Moderation Are Emerging

Finally, as the regulatory and technological landscape shifts, new forms of algorithmic moderation will develop. There are various EU and UK regulatory proposals requiring expansion of automated moderation to entirely new areas, such as private messaging (raising serious concerns around cybersecurity and surveillance).

AI image, text and video generation also represents a new frontier for content moderation. All manner of business-to-business and consumer software businesses are now integrating generative AI into products — including search engines (despite their inability to reliably answer factual questions) and social media, as well as photo-editing and word-processing software. As highlighted by the US writers’ and actors’ strikes, mainstream media corporations are very interested in generative AI. So are leading newspapers. Over time, ever more of our media environments — not only the content we consume, but also the tools we use to communicate, create and make sense of information — may be (partly) AI-generated.

Moderation is integral to the design and functioning of these tools. Workers who evaluate whether AI-generated content is safe, appropriate and helpful have been essential in making them commercially viable. Moderation filters trained on these evaluations will limit how generative AI can be deployed and shape what kind of content it tends to produce. This represents a new field for the exercise of political power and raises complex normative questions. It might seem obvious that generating hate speech should be prevented. But what if that means researchers studying fascism can’t use AI translation? Given well-known linguistic and cultural biases in how AI identifies hate speech, who will be at a disadvantage here? How should we think about the possibility that AI assistants’ ideological biases influence people’s opinions, given that ideologically neutral text generation will never be possible?

As things stand, a few big tech conglomerates have effectively monopolized the knowledge and human and physical capital needed to build advanced AI. Presumably, then, the algorithmic tools that determine which kinds of media can or can’t be generated will reflect their commercial priorities. It seems inevitable that these tools will also become a target for state intervention. As AI-generated media becomes ubiquitous online, this raises the prospect that our media environments will be shaped by ever more pervasive state and corporate control.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Rachel Griffin is a Ph.D. candidate and lecturer in law at Sciences Po Paris.