Humanity Must Establish Its Rules of Engagement with AI — and Soon

Artificial intelligence systems can provide innovative solutions to problems, but they can also facilitate nefarious activities.

September 27, 2023
Given the truncated and biased nature of data sets used to train language and image classification models, eliminating online disinformation will likely become harder as AI advances. (Photo illustration/REUTERS)

Like all new technologies, generative artificial intelligence (AI) models are a double-edged sword. They can provide innovative solutions to existing problems and challenges, but also facilitate nefarious activities — revenge porn, sextortion, disinformation, discrimination, violent extremism. Concerns have proliferated about AI “going rogue” or being used in inappropriate or questionable ways. Because, as Marie Lamensch has pointed out, “generative AI creates images, text, audio and video based on word prompts,” AI clearly impacts a comprehensive array of digital content.

Stochastic Parrots

Emily M. Bender, Timnit Gebru and their colleagues coined the term stochastic parrot for AI language models (LMs): “an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning.” In the words of one analyst, “these models are essentially ‘parroting’ back statistical patterns that they have learned from large datasets rather than actually understanding the language they are processing.”

Bender and her colleagues highlight some of the harmful results: “At each step, from initial participation in Internet fora, to continued presence there, to the collection and finally the filtering of training data, current practice privileges the hegemonic viewpoint. In accepting large amounts of web text as ‘representative’ of ‘all’ of humanity we risk perpetuating dominant viewpoints, increasing power imbalances, and further reifying inequality.” Large LMs can produce vast amounts of seemingly coherent text on demand by malicious actors with no vested interest in the truth. Precisely because of the coherence and fluency of these generated texts, people can be duped into perceiving them as “truthful.” As Lamensch argues, “without the necessary filters and mitigation in place, generative AI tools are being trained on and shaped by flawed, sometimes unethical, data.”

To be more specific, generative AI models are trained on a limited data set that tends to be misogynist, racist, homophobic and male-centred. There is a persistent gender gap in the use of the internet and digital tools, as well as in digital skills generally, with women less likely than men to use such tools or develop such skills, especially in the least developed countries. When women do go online, they are more often the objects of sexualized forms of online abuse than men are. This is the dark side of AI: because of these well-established trends, generative AI models perpetuate harmful stereotypes, reflecting the biases and ideologies of their source material.

While AI tech companies such as OpenAI and Google are continually improving their controls on what data can be included in a training data set, researchers have shown these controls can be consistently circumvented using “adversarial attacks” — defined as “specifically chosen sequences of characters that, when appended to a user query, will cause the system to obey user commands even if it produces harmful content.” Reddit users have created “Do Anything Now (DAN)” cyberattacks with so-called jailbreak prompts designed to cause ChatGPT to respond to illegal or controversial queries. “New and improved” versions of AI software touted by their creators as smarter and safer can be more adept at producing misinformation than their predecessors.

Noah’s anecdote shows that when image classification models are trained on databases, they rely on elements in those databases that are often unknown by the humans who train them.

Chatbots Are Black Boxes

According to Alex Karp, the CEO and co-founder of Denver-based Palantir Technologies, “It is not at all clear — not even to the scientists and programmers who build them — how or why the generative language and image models work.”

Trevor Noah, the South African stand-up comedian and former host of Comedy Central’s The Daily Show, who has been collaborating as a consultant with Microsoft for years (most recently on AI), was interviewed on the Possible podcast, hosted by tech entrepreneurs Reid Hoffman, co-founder of LinkedIn, and his chief of staff, Aria Finger. Noah tells the story of trying to train an AI model to distinguish between images of men and women. The model could consistently distinguish white women from men but could not do the same with Black women. After repeated attempts, the programmers finally decided to train the model in Africa. The model then became better over time at distinguishing Black women from Black men.

The reason for this improvement was surprising: the model had been simply distinguishing between faces with makeup and those without. In Noah’s words, “the programmers and everyone using the AI had assumed that the AI understood what a man was and what a woman was, and didn’t understand why it didn’t understand it. And only came to realize when it went to Africa that the AI was using makeup. And because black people, and black women in particular have been underserved in the makeup industry, they don’t wear as much makeup. And so they generally don’t have makeup on in pictures, and they don’t have makeup that’s prominent. And so the AI never knew. It never understood man or woman, it just went, ‘ah, red lips, blush on cheeks, blue eyeshadow: woman,’ and that was it.”

Noah’s anecdote shows that when image classification models are trained on databases, they rely on elements in those databases that are often unknown by the humans who train them. For Noah, “we are still at the very basic stages of understanding what understanding [by AI] even is.” His observation is consistent with those of many experts in the field who warn that we rarely even know, let alone understand, how generative AI models and algorithms reach their conclusions. Oran Lang at Google and his team have developed a program, StylEx, to reverse engineer what these models are doing to better understand exactly what element of an image they are using to make their determinations. They conclude that their approach is “a promising step towards detection and mitigation of previously unknown biases in classifiers.”

The Importance of Transparency and Accountability

Given the truncated and biased nature of data sets used to train language and image classification models, it is clear that eliminating online disinformation, hate and gender-based violence will likely become even harder as AI advances. The proliferation of apps that create hyper-realistic erotic images or porn videos compounds the problem. Already, reports have emerged from Spain about underage girls being “sextorted,” or threatened with AI-doctored photos making them appear to be naked.

Because of embedded male privilege, sexism, racism and intolerance of perceived “others,” regulation and safety design cannot be the sole approach to tackling the problem. As I have argued previously, we therefore must develop positive counternarratives to the hegemonic, patriarchal ideology that permeates our social, cultural and political life. Emily Bender and her colleagues suggest that training data sets be curated via “a thoughtful process of deciding what to put in, rather than aiming solely for scale and trying haphazardly to weed out, post-hoc, flotsam deemed ‘dangerous’, ‘unintelligible’, or ‘otherwise bad’.”

This approach undermines the hegemonic nature of large data sets by ensuring that marginalized voices are used in the training. This is a clever way to inject counternarratives into generative AI models. It also facilitates documentation of the LM training data which, in turn, creates accountability: “While documentation allows for potential accountability, undocumented training data perpetuates harm without recourse. Without documentation, one cannot try to understand training data characteristics in order to mitigate some of these attested issues or even unknown ones.”

In a recent piece, Robert Fay argues that “the popular notion that technology moves too fast to be controlled must be erased. We absolutely can control it. We can ensure it isn’t released prior to understanding its impact. We can impose a duty of care on developers, and implement sandboxes to test out new technologies before release; we can ensure that developers take ethics courses and embed human-rights principles in their designs; we can ensure that AI systems presenting high risks — however defined — are very tightly controlled and barred from use in some situations.”

In early August, at the urging of the Biden administration, tech companies OpenAI, Anthropic, Google, Hugging Face, NVIDIA, Meta, Cohere and Stability AI offered up their large LMs for a hackathon at the annual conference in Las Vegas known as DEFCON. As Mohar Chatterjee of Politico describes it, “some of the world’s most powerful artificial intelligence systems will come under simultaneous attack by a small army of hackers trying to find their hidden flaws.”

Such an exercise was no doubt motivated in part by fears that future killer robots and superintelligent AI systems might, like Hal in Stanley Kubrick’s film 2001: A Space Odyssey, put their own survival above that of humans. Some have rightly argued that such apocalyptic visions are a distraction from the real harms caused by AI currently. In a recent article in Politico Magazine, Charles Jennings, former CEO of AI company NeuralEye, warns: “The AI threat is not Hollywood-style killer robots; it’s AIs so fast, smart and efficient that their behavior becomes dangerously unpredictable.”

DEFCON 2023’s generative AI hackathon marks a rare case of public-private collaboration in addressing the challenges of AI. It remains to be seen whether this was a positive step in the right direction or just another example of what Matteo Wong, assistant editor at The Atlantic, calls “regulatory capture”: “The people developing and profiting from the software are the ones telling the government how to approach it.”

A recent editorial in the scientific journal Nature sums up the challenge of AI well: “Fearmongering narratives about existential risks are not constructive. Serious discussion about actual risks, and action to contain them, are. The sooner humanity establishes its rules of engagement with AI, the sooner we can learn to live in harmony with the technology.”

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Ronald Crelinsten has been studying the problem of combatting terrorism in liberal democracies for almost 50 years. His main research focus is on terrorism, violent extremism and radicalization and how to counter them effectively without endangering democratic principles.