The Bias that Lies Beneath: Qualitative Uncovering of Stereotypes in Large Language Models

William Babonnaud; Estelle Delouche; Mounir Lahlouh

doi:10.3384/ecp208022

Authors

William Babonnaud
Estelle Delouche
Mounir Lahlouh

DOI:

https://doi.org/10.3384/ecp208022

Abstract

The rapid growth of Large Language Models (LLMs), such as ChatGPT and Mistral, has raised concerns about their ability to generate inappropriate, toxic and ethically problematic content. This problem is further amplified by LLMs' tendency to reproduce the prejudices and stereotypes present in their training datasets, which include misinformation, hate speech and other unethical content. Traditional methods of automatic bias detection rely on static datasets that are unable to keep up with society's constantly changing prejudices, and so fail to capture the large diversity of biases, especially implicit associations related to demographic characteristics like gender, ethnicity, nationality, and so on. In addition, these approaches frequently use adversarial techniques that force models to generate harmful language. In response, this study proposes a novel qualitative protocol based on prompting techniques to uncover implicit bias in LLM-generated texts without explicitly asking for prejudicial content. Our protocol focuses on biases associated with gender, sexual orientation, nationality, ethnicity and religion, with the aim of raising awareness of the stereotypes perpetuated by LLMs. We include the Tree of Thoughts technique (ToT) in our protocol, enabling a systematic and strategic examination of internal biases. Through extensive prompting experiments, we demonstrate the effectiveness of the protocol in detecting and assessing various types of stereotypes, thus providing a generic and reproducible methodology. Our results provide important insights for the ethical evaluation of LLMs, which is essential in the current climate of rapid advancement and implementation of generative AI technologies across various industries.

Warning: This paper contains explicit statements of offensive
or upsetting contents.

The Bias that Lies Beneath: Qualitative Uncovering of Stereotypes in Large Language Models

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License