Technology Alone Can't Preserve Endangered Languages

June 30, 2018
The annual session of the UN Permanent Forum on Indigenous Issues. (Photo by UN Women)

According to the UNESCO Atlas of the World’s Languages in Danger, at least 43 percent of the estimated 6,000 languages spoken in the world are endangered. This figure is likely to be low, since there are many languages that lack sufficient data to allow for an assessment of their vulnerability.

Increasingly, artificial intelligence (AI) is being used for language learning around the world; major technology companies are making substantial investments in natural language and voice interface platforms. But, even as advancements are made in this area, Google’s former Executive Chairman Eric Schmidt acknowledged that languages that are not prevalent are disadvantaged by AI. These languages are not simply threatened by the  declining population of native speakers but also the technological systems that give preference to the most commonly spoken languages in the world.

There are some examples of AI being used to help document and process audio recordings of endangered Indigenous languages around the world. Dr. Janet Wiles, a researcher with the ARC Centre of Excellence for the Dynamics of Language (CoEDL), is working to transcribe and preserve endangered languages. CoEDL has more than 50,000 hours of audio recordings; it is estimated that transcribing this audio using traditional methods will take approximately  two million hours. In order to overcome this challenge, CoEDL partnered with Google in 2017 to develop machine learning technologies to process the audio recordings. Thus far, this data has been used to develop AI models for 12 Indigenous languages in Australia.

While AI and other exponential technologies present opportunities for the preservation of language, there are also inherent challenges to using these technologies to meaningfully revitalize these languages. Many Indigenous languages are rooted in oral tradition. The act of transcribing them into a written form may alter or fail to capture the full meaning of these languages.

According to Derrick de Kerckhove, in his book chapter: Alphabetic Literacy and Brain Processes, “the alphabet allows the brain to rely on the succession of letters, without having to check its interpretation with reference to a context” and that this “habit of breaking information into parts and ordering said parts in a proper sequence is metaphorically the beginning of artificial intelligence.”

The very act of transcribing these languages and removing them from the contexts and cultures they are embedded in is likely to contribute to a loss in meaning. Language is not simply communicated through written words. The way people speak, their facial expressions, and the context and environment in which they speak, all contribute to the message that is being conveyed. Simply recording Indigenous languages and using these recordings to develop AI tools is not enough to safeguard and revitalize Indigenous languages. The vast array of knowledge and culture captured by Indigenous languages would be lost if we fail to recognize that language is more than a succession of letters and sounds.

Although researchers in the field of affective computing have tried to develop technology that can recognize and reflect human emotions, these efforts have largely fallen short.

The languages preferred in international discourse and fora impact the understanding of the issues being discussed. Some researchers have argued that language is important to hegemony, especially as it relates to cultural domination. It is perhaps for this very reason that the residential school system consciously worked to deprive Indigenous youth of their language, as a means of control and dominance. In the digital space, the deployment of AI tools that use a select group of languages in their design may replicate colonialism and marginalization.

According to Renata Avila from the World Wide Web Foundation, “digital colonialism is the new deployment of a quasi-imperial power over a vast number of people, without their explicit consent.” There is a real risk that our attempts to realize the Truth and Reconciliation Commission’s calls to action will fall short if we do not make conscious efforts to prevent the colonization of both the physical and  digital spaces.

As it stands, a majority of online content is stored in English and Chinese, with the top 10 languages representing over 80 percent of all online content. The absence of Indigenous languages in the digital space can contribute to a lack of representation and a loss of Indigenous knowledge. As younger generations of Indigenous peoples become more actively engaged online and use digital applications that prize propositional knowledge and Western tradition over Indigenous language and culture, their behaviour and understanding of the world is likely to be altered. Such an outcome could marginalize Indigenous populations around the world by depriving them of their culture and identity.

Arguably, ensuring that AI is programmed in Indigenous languages, by and for Indigenous peoples, is a first step for avoiding a colonized digital space.

It is estimated that 87 percent of Indigenous languages are endangered in Canada. Jacey Firth-Hagen is a young Gwich’in whose passion for learning the Gwich’in language inspired her to start the Speak Gwich’in to Me social media movement. The Gwich’in language is endangered, with fewer than 370 people in Canada and approximately 670 worldwide speaking the language. As part of the campaign, Jacey and other Gwich’in language learners and speakers share posts containing Gwich’in words and phrases, as well as links to language learning resources.

These resources — and the online community facilitating their circulation — are much needed, but alone, they are not sufficient for the revitalization of the Indigenous language, as there is currently no tool for translating conversation into Gwich’in and each of these tools or resources requires an understanding of English in order for them to be used effectively.

Efforts are being made at a federal level; the Canadian government is currently planning to table legislation that would “recognize Indigenous languages as a constitutional right and create a new office of commissioners to protect and promote them.”

For this policy to be effective, it must recognize the impacts and opportunities that AI technologies present in revitalizing Indigenous languages, too.

AI’s ability to recognize patterns will never be an effective replacement for the holistic and meaningful learning that happens when Indigenous elders and leaders pass their language and knowledge down to younger generations. Further, AI tools for the protection and revitalization of Indigenous languages must be developed with Indigenous involvement and consent.  

Two years ago, the United Nations General Assembly adopted a resolution on the rights of Indigenous peoples, proclaiming 2019 as the International Year of Indigenous Languages. But there’s much to do before 2019. In order for technology to capture the extensive knowledge that exists, the people developing that technology must first have a stronger understanding of and appreciation for that knowledge, and the diverse languages that carry it. 

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Bushra Ebadi is a social innovator focused on designing sustainable, innovative solutions to complex global challenges using her multidisciplinary background and skills in design and systems thinking, policy analysis and mixed methods research.