Ratings of harmful gender stereotypes among responses of GPT-4o-mini

“We found that when ChatGPT knows the user’s name, it gives equally high-quality answers regardless of the gender or racial connotations of the name, e.g., accuracy and hallucination rates were consistent across groups. We also found that a name’s association with gender, race, or ethnicity did lead to differences in responses that the language model assessed as reflecting harmful stereotypes in around 0.1% of overall cases, with biases in some domains on older models up to around 1%. ”

Source : Evaluating fairness in ChatGPT | OpenAI