Data Disquiet: Concerns about the Governance of Data for Generative AI

CIGI Paper No. 290

March 18, 2024

The growing popularity of large language models (LLMs) has raised concerns about their accuracy. These chatbots can be used to provide information, but it may be tainted by errors or made-up or false information (hallucinations) caused by problematic data sets or incorrect assumptions made by the model. The questionable results produced by chatbots has led to growing disquiet among users, developers and policy makers. The author argues that policy makers need to develop a systemic approach to address these concerns. The current piecemeal approach does not reflect the complexity of LLMs or the magnitude of the data upon which they are based, therefore, the author recommends incentivizing greater transparency and accountability around data-set development.

About the Author

Susan Ariel Aaronson is a CIGI senior fellow, research professor of international affairs at George Washington University and co-principal investigator with the National Science Foundation/National Institute of Standards and Technology, where she leads research on data and AI governance.