Are Large Language Models Actually Getting Safer?

Digital Policy Hub Working Paper

Ashley Ferreira

July 14, 2025

Each new release of large language models (LLMs) often comes with claims of both improved performance and enhanced safety. However, there is a lack of standardized safety assessments and a gap in studying these metrics over time. This working paper aims to address this gap by analyzing performance on various standardized safety benchmarks across various LLMs released in the last three years to gauge if they are becoming safer. Under this method of evaluation, newer models are overall scoring higher on these benchmarks; however, these improvements are not dramatic, and when the newer models do fail, these failures are far more consequential as more current models are more capable of causing harm. Going forward, these safety benchmarks should consider this added dimension of quantifying how harmful LLM failures can be. It is recommended to devise a system in which the vulnerabilities of LLMs can be studied, shared and addressed, but the specifics on how to exploit them are guarded by bad actors. Finally, since improvements in safety do not seem to be naturally keeping pace with improvements in overall artificial intelligence, more external pressure is required to ensure we sufficiently guard against the release of dangerous models.

About the Author

Ashley Ferreira

Ashley is a former Digital Policy Hub undergraduate fellow. As a student in the physics and astronomy program at the University of Waterloo, Ashley has been interested in artificial intelligence for the past few years, particularly using it as a tool to advance both science and data-driven policy.

Are Large Language Models Actually Getting Safer?

Digital Policy Hub Working Paper

About the Author

Recommended

Taking the Wrong Lesson from China’s AI Strategy

Mapping the Money Multiverse: From Cash to Crypto and Everything In Between

China’s AI Governance Initiative and Its Geopolitical Ambitions

Need for Regulation Is Urgent as AI Chatbots Are Being Rolled Out to Support Mental Health

The Silent Erosion: How AI’s Helping Hand Weakens Our Mental Grip

Real-Time Rail and the Future of Digital Payments in Canada

Could Stablecoins Create Financial Instability?

How to Confront Canada’s Digital Dependence

AI Is Reviving Fears Around Bioterrorism. What’s the Real Risk?

Structuring Markets for Strategic Quantum Innovation

Developing Countries’ Business Participation in the AI Economy

How Canada Can Advance International AI Governance at the G7