Until the Machine Learns Your Language, You Stay Put

June 13, 2022
02_Taye_BG 02_Taye_MG1 02_Taye_MG2-shapes1 02_Taye_MG2-shapes2 02_Taye_MG3-parent-to-MG1 02_Taye_FG

This essay is part of The Four Domains of Global Platform Governance, an essay series that examines platform governance from four distinct policy angles: content, data, competition and infrastructure.

Tensions in Ethiopia have been high for some time. Ethnic violence in the country is rampant (Al Jazeera 2021), the government is at war with itself (BBC News 2021) and it seems history is on a loop. The offline tensions are transposed online (Gilbert 2020). The warring parties to this conflict and their supporters have taken the battle to Facebook, YouTube, Twitter, Telegram and other social media platforms. Online content escalates as the situation in Ethiopia flares up and it morphs as groups and identities merge and collide.

The self-proclaimed defenders of an ethnic group will livestream one- or two-hour-long videos that include a bit of contested history, a bit of music and dancing,1 as well as the usual abuse2 and hateful, violence-inciting content all mixed into one.3 When they do not have the time to livestream videos, they pack their bullets into 240 characters4 and shoot at the latest victim. Although these kinds of content are in clear violation of Facebook’s “community standards” and are illegal in Ethiopia, Facebook has failed to take the necessary actions to stop the spread of hate speech and violence-inciting content in Ethiopia (Gilbert 2020). When users report hateful, violence-inciting and harmful content using Facebook’s in-app reporting system, it is common for them to receive a reply from Facebook indicating that this sort of content does not violate its community standards or, in some cases, to receive no response at all. This has been the reality of content moderation in Ethiopia for the past few years.

So, when Frances Haugen, the latest Facebook whistleblower, testified to the US Congress that Facebook had been used to incite ethnic violence in Ethiopia and genocide in Myanmar (Akinwotu 2021), most of those who have been victims of Facebook’s lack of care and due diligence were not surprised. Haugen’s revelations did not come as a shock because digital rights researchers and victims have, on numerous occasions, flagged this issue to Facebook (Roose and Mozur 2018). However, these calls fall on deaf ears. In addition to failing to take the necessary measures to avert the crisis exacerbated by its platform, Facebook continued to invest in its faulty artificial intelligence (AI) rather than in human moderators (Seetharaman, Horwitz and Scheck 2021). By default, Facebook left the most vulnerable and marginalized to the mercy of a system that did not recognize them.

Rather than investing in human beings who understand the country’s language and context, Facebook left content moderation to its AI. A recent report from Facebook during Ethiopia’s August 2021 election confirms this: “we’ve...invested in proactive detection technology that helps us catch violating content before people report it to us. We’re now using this technology to proactively identify hate speech in Amharic and Oromo, alongside over 40 other languages globally” (Ndegwa 2021). Facebook continues to assert that, between March 2020 and March 2021, it removed 87,000 pieces of hate speech in Ethiopia, and almost 90 percent of this content was proactively detected (ibid.).

This raises three questions: Without providing the total number of pieces of content on Ethiopia, how can the effectiveness of the proactive technology be gauged? Second, if this proactive technology effectively takes down content, why does Facebook continue to fail to take down the most egregious content reported by its users on its platforms? Third, why does this technology fail to automatically detect content that is dangerous and hateful?

Even if this proactive technology works, it needs to be trained with Ethiopian languages. However, the reality is that online content in Ethiopian languages such as Afaan Oromo, Tigrinya, Amharic or Somali is limited. For instance, it is only recently that Apple added the Geez alphabet to its list of keyboards on the iPhone. Therefore, until the recent iOS update, those communicating in Geez languages such as Amharic and Tigrinya on iOS had to depend on third-party apps that provided this specific service, at times at a cost. In addition, the richness of Ethiopian languages, with all their intricacies, is not present online. For example, it is very common for Google Translate to make mistakes and completely mistranslate Amharic and other languages. Therefore, for a language that is not yet fully present online, the ability of this proactive technology to learn, understand nuances and respond correctly is limited. In these circumstances, this means that this proactive technology depends on limited slur and hate speech terms developed in Ethiopian languages. Even for a multi-billion-dollar corporation like Facebook, it is impossible to compile all the hateful words in one spreadsheet and maintain these terms as the context, perpetrators and victims change. A recent US Securities and Exchange Commission filing (Zubrow, Gavrilovic and Ortiz 2021) by the anonymous Whistleblower Aid organization indicates that, even in a context where Facebook has been linked to genocide, the hate speech classifiers for Myanmar/Burma are currently not being used or maintained.

The reality is that this proactive technology, or AI, does not work.

Moreover, since hateful words are not enough, the proactive technology often cross-references terms from other languages, such as English. Once an AI technology starts using this approach, it loses context and nuance. It implements a top-down censorship system prone to taking down content that does not violate community standards. For instance, the phrase “the colour of your eyes” in English might be flagged on the platform because, in some Western cultures, it is common to discriminate based on a person’s eye colour. However, in Amharic, Tigrinya or Afaan Oromo, this phrase does not mean much because most Ethiopians have the same eye colour or rarely face discrimination based on eye colour. Therefore, even if one assumes this proactive technology works, it is highly unlikely that it will work in the Ethiopian context.

But the reality is that this proactive technology, or AI, does not work. According to a recent report from The Wall Street Journal, Facebook’s silver bullet for hate speech and content moderation cannot differentiate between “cockfighting and car crashes”; it can only catch three–five percent of hateful and dangerous content in English (Seetharaman, Horwitz and Scheck 2021). Even the engineers behind this technology have cast doubt on the effectiveness of this tool, especially in places such as Ethiopia, where narratives are contested, context is scarce and the AI machine does not understand the nuances of the language. If the technology’s English efficacy is this limited, one can only imagine the margin of error for content in Afaan Oromo, Amharic, Tigrinya or other languages.

Furthermore, even though AI has not worked in English and is still failing in other languages, Facebook continued “to cut the time human reviewers focused on hate speech complaints from users and made other tweaks that reduced the overall number of complaints” (ibid.). As a result, the platform proactively took away the only redress people have to stop the vitriol.

Due to faulty AI, the lack of content in Ethiopian languages and proactive divestment from human moderators, Facebook’s content moderation in Ethiopia and other countries depends on underfunded and under-resourced civil society and grassroots groups. These groups often spend significant amounts of time documenting content and reporting it through Facebook’s trusted partner channels or their contacts within Facebook’s human rights team (Gilbert 2020). Those who compile content in an Excel spreadsheet and send it to Facebook’s human rights team ironically have to navigate out-of-office replies and redundant questions for contexts and, at times, requests for translations of reported content. Even after navigating all of this, Facebook employees cannot guarantee that harmful content will be taken down. For now, a country of 120 million people is at the mercy of AI that does not recognize it or its languages. Even if AI does recognize Ethiopian languages and context in the future, it may not help stop the abuse and violence online.

Works Cited

Akinwotu, Emmanuel. 2021. “Facebook’s role in Myanmar and Ethiopia under new scrutiny.” The Guardian, October 7. www.theguardian.com/technology/2021/oct/07/facebooks-role-in-myanmar-and-ethiopia-under-new-scrutiny.

Al Jazeera. 2021. “Why is ethnic violence surging in Ethiopia?” Al Jazeera, April 19. www.aljazeera.com/news/2021/4/19/why-is-ethnic-violence-surging-in-ethiopia.

BBC News. 2021. “Ethiopia’s Tigray war: The short, medium and long story.” BBC News, June 29. www.bbc.com/news/world-africa-54964378.

Gilbert, David. 2020. “Hate Speech on Facebook Is Pushing Ethiopia Dangerously Close to a Genocide.” Vice, September 14. www.vice.com/en/article/xg897a/hate-speech-on-facebook-is-pushing-ethiopia-dangerously-close-to-a-genocide.

Ndegwa, Mercy. 2021. “How Facebook is Preparing for Ethiopia’s 2021 General Election.” Facebook, June 10. https://about.fb.com/news/2021/06/how-facebook-is-preparing-for-ethiopias-2021-general-election/.

Roose, Kevin and Paul Mozur. 2018. “Zuckerberg Was Called Out Over Myanmar Violence. Here’s His Apology.” The New York Times, April 9. www.nytimes.com/2018/04/09/business/facebook-myanmar-zuckerberg.html.

Seetharaman, Deepa, Jeff Horwitz and Justin Scheck. 2021. “Facebook Says AI Will Clean Up the Platform. Its Own Engineers Have Doubts.” The Wall Street Journal, October 17. www.wsj.com/articles/facebook-ai-enforce-rules-engineers-doubtful-artificial-intelligence-11634338184.

Zubrow, Keith, Maria Gavrilovic and Alex Ortiz. 2021. “Whistleblower’s SEC complaint: Facebook knew platform was used to ‘promote human trafficking and domestic servitude.’” 60 Minutes Overtime, October 4. www.cbsnews.com/news/facebook-whistleblower-sec-complaint-60-minutes-2021-10-04/.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Berhan Taye is an independent researcher, analyst and facilitator who investigates the relationship between technology, society and social justice.

The Four Domains of Global Platform Governance

In the span of 15 years, the online public sphere has been largely privatized and is now dominated by a small number of platform companies. This has allowed the interests of publicly traded companies to determine the quality of our civic discourse, the character of our digital economy and, ultimately, the integrity of our democracies. This essay series brings together a global group of scholars working in four distinct domains of the platform governance policy discourse: content, data, competition and infrastructure.

Competition

09_Bietti

Antitrust’s Crossroads

Elettra Bietti
10_Mutung

Competition and Data Protection among Mobile Network Operators

Grace Mutung’u
12_Zhang

How Antitrust Facilitates China’s Goal to Achieve Technological Self-Sufficiency

Angela Huyue Zhang
Video-thumb_3-Competition

Platform Domination Stifles Competition

Grace Mutung’u

Infrastructure

15_Shen

How to Understand China’s Globalized Digital Infrastructure

Hong Shen
14_Gagliardione

Beyond the Digital Cold War: Western, Eastern and Southern Tales of Digital Failure and Success

Iginio Gagliardone
16_Bradshaw-deNardis_thumb

Internet Infrastructure as an Emerging Terrain of Disinformation

Samantha Bradshaw Laura DeNardis
Video-thumb_4-Infrastructure

Internet Infrastructure Under Attack

Laura DeNardis Samantha Bradshaw