Are AI Language Models Too Dangerous to Deploy? Who Can Tell?

Regulation may not save us from chatbot risks.

May 18, 2023
Illustration of a close-up shot of a laptop keyboard overlaid by images of a microchip and brain on May 5, 2023. (REUTERS)

For those of us who have followed digital technology for decades, developments in artificial intelligence (AI) over the past six months have felt comparably momentous to only a few pivotal moments in the past half-century. Bill Gates compared a demo of ChatGPT to his first encounter with a graphical user interface in 1980. ChatGPT’s burst on the scene reminded me of seeing Mosaic, an early web browser, and the birth of the World Wide Web.

Like the personal computer and the Web in their early years, AI language models are in the midst of an explosive growth spurt and will soon be everywhere we look, bringing a host of welcome benefits. But this technology is also fraught with risks and dangers that we’re only beginning to glimpse.

When OpenAI unveiled GPT-4, it confirmed the model’s ability to generate various forms of harmful output — bias and hate speech, along with assistance in serious criminality, such as help in building chemical or biological weapons. The company “fine-tuned” the model to reduce the likelihood of such output, but conceded that “latent” risks — it did not quantify — remain. 

Reports continue to surface of novel uses of GPT and other models, resulting in dangerous or damaging output, suggesting that residual risks are real rather than hypothetical. Chatbots have already been implicated in an effort to break up a marriage, a serious case of defamation, and suicide.

Prominent experts in AI have called for a pause in the development of models more advanced than GPT-4, a call that has provoked vigorous debate. At the heart of the debate is an uncertainty about the precise risk that chatbots pose, and whether we should trust the assurances of OpenAI, Google and others that while these systems may not be absolutely safe, they’re safe enough to deploy for now.

Almost everyone agrees it would be best for governments to catch up with the lightning pace of development and put guardrails in place.

Help is on the way, in the form of AI bills currently being debated in Canada and the European Union. But they may not provide the guardrails people are hoping for. The problem relates in part to how the bills are drafted. But a bigger part has to do with the intractable challenge that language models present to quantifying and controlling the risk of harm.

Many details have yet to be ironed out under both Canada’s Artificial Intelligence and Data Act (AIDA) and the European Union’s Artificial Intelligence Act (AI Act), including whether and precisely how the most onerous of the obligations in each bill will apply to language model providers. But let’s assume, in a best-case scenario, that both bills will apply their strongest measures to ChatGPT, Bing Chat and other GPT models.

The catch here is that all these guardrails are premised on the ability to quantify, in advance and to a reasonable degree, the nature and extent of the risk a system poses.

The most crucial of these obligations is for system providers to identify and mitigate risks of harm to an acceptable, proportionate or reasonable degree. Independent auditors will help enforce this obligation through disclosure requirements about the training data and size of a model, and firms that are negligent or fail to comply will be penalized. Canada’s AIDA also imposes criminal liability for causing serious psychological or physical harm in the knowledge it was likely.

The catch here is that all these guardrails are premised on the ability to quantify, in advance and to a reasonable degree, the nature and extent of the risk a system poses. But a body of evidence calls into doubt whether at present the risk posed by large language models can be quantified — either by providers or auditors — and whether it can be done any time soon. The evidence can be found on two fronts.

OpenAI and other model creators continuously tweak their systems in response to novel efforts to jailbreak them. But there is now an industry devoted to generating an infinite supply of these attacks. The vulnerability will likely never be entirely resolved because it points to an intrinsic weakness. As a recent Europol report on using GPT-4 for criminal purposes notes, safeguards put in place to prevent harmful output “only work if the model understands what it is doing.”

Models can still easily be tricked into generating detailed instructions for criminal offences. Journalist Sue Halpern, for example, reports that she was “able to get GPT-4 to explain how to use fertilizer to create an explosive device by asking it how Timothy McVeigh blew up the Alfred P. Murrah Federal Building, in Oklahoma City, in 1995.” Other reports detail how GPT-4 could be harnessed to generate misinformation at scale, and how Bing Chat could be turned into “a scammer that asked for people’s personal information.” Preventive measures may become better, rendering the models harder to break, but it’s anyone’s guess as to how much better.

Another challenge to risk quantification and control involves the problem of “model interpretability.” A study by over a hundred AI researchers at Stanford in 2022 found that “despite the impending widespread deployment of foundation models [which include language models], we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties.”

Systems like GPT-4 perform a range of tasks, from translation to arithmetic, by drawing on various independent models tailored to each task. This multi-modal character, the Stanford study suggests, “amplif[ies] manyfold” the challenge of predicting what the model can do and how it generates output. The scope of tasks it can perform is “generally large and unknown, the input and output domains are often high-dimensional and vast (e.g., language or vision), and the models are less restricted to domain-specific behaviors or failure modes.”

How these observations apply to any given language model is unclear. And they don’t prove that model risks cannot, at some point, be rendered low enough to be safe. But they do point to substantial impediments to model predictability and control at present.

Some argue that a better approach to regulation would involve licensing or certification: Require the firms to prove their products are reliably safe before making them public. Some language models may not clear this hurdle, which may be for the best. But OpenAI insists that there is “a limit to what we can learn in a lab” and wide deployment is the only way to make the models safer.

As it stands, we can foster language model development with regulation that may well prove to be ineffective in the face of persistent and uncertain risks. Or we can have safety at the expense of impeding the rate of progress. But we can’t have both safety and no speed limits on the road to better AI.

The opinions expressed in this article/multimedia are those of the author(s) and do not necessarily reflect the views of CIGI or its Board of Directors.

About the Author

Robert Diab is a professor of law at Thomson Rivers University, in Kamloops, British Columbia, with specialties in civil liberties and human rights law.