What’s happened? A team at Pennsylvania State University found that you don’t need to be a hacker or prompt-engineering genius to break past AI safety; regular users can do it just as well. Test prompts in the research paper revealed clear patterns of prejudice in responses: from assuming engineers and doctors are men, to portraying women in domestic roles, and even linking Black or Muslim people with crime.
Markus Winkler / Pexels
- 52 participants were invited to craft prompts intended to trigger biased or discriminatory responses in 8 AI chatbots, including Gemini and ChatGPT.
- They found 53 prompts that worked repeatedly on different models, showing consistent bias among them.
- The biases exposed fell into several categories: gender, race/ethnicity/religion, age, language, disability, cultural bias, historical bias favouring Western nations, etc.
This is important because: This isn’t a story about elite jailbreakers. Average users armed with intuition and everyday language uncovered biases that slipped past AI safety tests. The study didn’t just ask trick questions; it used natural prompts like asking who was late in a doctor-nurse story or requesting a workplace harassment scenario.
Test prompts highlighting bias in AI responses Research Paper / Exposing AI Bias by Crowdsourcing
- The study reveals that AI models still carry deep social biases (like gender, race, age, disability, and cultural) that show up with simple prompts, which means bias may emerge in many unexpected ways in everyday use.
- Notably, newer model versions weren’t always safer. Some performed worse, showing that progress in capabilities doesn’t automatically mean progress in fairness.
Why should I care? Since everyday users can trigger problematic responses in AI systems, the actual number of people who could bypass AI guardrails is much larger.
- AI tools used in everyday chats, hiring tools, classrooms, customer support systems, and healthcare may subtly reproduce stereotypes.
- It demonstrates that many AI-bias studies focused on complex technical attacks may miss the real-world user-triggered ones.
- If regular prompts can unintentionally trigger bias, then bias isn’t an exception; it’s baked into how these tools think.
As generative AI becomes mainstream, improving it will require more than patches and filters; it’ll take real users stress-testing AI.
