Research shows even average users can break past AI safety within Gemini and ChatGPT

What’s happened? A team at Pennsylvania State University found that you don’t need to be a hacker or prompt-engineering genius to break past AI safety; regular users can do it just as well. Test prompts in the research paper revealed clear patterns of prejudice in responses: from assuming engineers and doctors are men, to portraying women in domestic roles, and even linking Black or Muslim people with crime.

Markus Winkler / Pexels

52 participants were invited to craft prompts intended to trigger biased or discriminatory responses in 8 AI chatbots, including Gemini and ChatGPT.
They found 53 prompts that worked repeatedly on different models, showing consistent bias among them.
The biases exposed fell into several categories: gender, race/ethnicity/religion, age, language, disability, cultural bias, historical bias favouring Western nations, etc.

This is important because: This isn’t a story about elite jailbreakers. Average users armed with intuition and everyday language uncovered biases that slipped past AI safety tests. The study didn’t just ask trick questions; it used natural prompts like asking who was late in a doctor-nurse story or requesting a workplace harassment scenario.

Test prompts highlighting bias in AI responses Research Paper / Exposing AI Bias by Crowdsourcing

The study reveals that AI models still carry deep social biases (like gender, race, age, disability, and cultural) that show up with simple prompts, which means bias may emerge in many unexpected ways in everyday use.
Notably, newer model versions weren’t always safer. Some performed worse, showing that progress in capabilities doesn’t automatically mean progress in fairness.

Why should I care? Since everyday users can trigger problematic responses in AI systems, the actual number of people who could bypass AI guardrails is much larger.

AI tools used in everyday chats, hiring tools, classrooms, customer support systems, and healthcare may subtly reproduce stereotypes.
It demonstrates that many AI-bias studies focused on complex technical attacks may miss the real-world user-triggered ones.
If regular prompts can unintentionally trigger bias, then bias isn’t an exception; it’s baked into how these tools think.

As generative AI becomes mainstream, improving it will require more than patches and filters; it’ll take real users stress-testing AI.

This week in EV Tech: VW pushes budget EVs as Ram pulls plug on pricey truck

Soomaaliya: ‘Booqashadii wafdiga Imaaraatka xiriir lama lahayn mid kale’

Muxuu dhiig-karku ugu dhacaa hooyooyinka Soomaaliyeed xiliga dhalmada?

Jigjiga oo laga fuliyay adeeg caafimaad oo aad loogu baahnaa

Maxay fallanqeeyayaashu ka yiraahdeen saamaynta ay ku yeellanayso madasha xubnaha ka baxay maxayse dowladda ugu dhigan tahay?

Research shows even average users can break past AI safety within Gemini and ChatGPT

Related

Chrome’s new AI Mode button could be Google’s way of keeping you from rivals

If Google’s Gemini update for Maps works, I can’t wait to get on the road

Gemini’s Deep Research tool may soon learn to pull sources from your Gmail and Drive

Google AI Mode now takes the hassle out of ticket hunting by doing it for you

The viral Sora AI video generator app finally hits Android without a core frustration

Microsoft’s new AI image maker is live, and early testers are loving it

Your Google speaker from 2016 can run Gemini for Home today

Leave a Reply Cancel reply