When Threatening AI Yields Better Results: A Question of Prompt Psychology
You might be surprised to learn that, according to Google co-founder Sergey Brin, AI language models actually perform better when you threaten them. Yes—threaten them. Brin made the remark recently during an “All-In” podcast session, and while it may sound absurd, there's nuance—and some science—behind the claim.
What Did Sergey Brin Actually Say?
Here’s what Brin said (paraphrased for clarity):
“This is a weird thing—it’s not often talked about in the AI community—but all (AI) models, not just ours, tend to do better if you threaten them… with physical violence. People feel awkward about it, so we don’t talk about it. Historically, you might joke ‘I’ll kidnap you if you don’t do X.’ And, surprisingly, it works.”
It’s jarring, to say the least. Messaging tools like ChatGPT are designed to respond to polite prompts—‘please’ and ‘thank you’—not implicit threats. But Brin’s anecdote suggests there's something deeper at play.
Researchers Put It to the Test
Curious researchers from Wharton at the University of Pennsylvania decided to rigorously test Brin’s claim. They designed experiments comparing standard prompts with variations that included threats (“I’ll punch you if you get this wrong!”) or even offers (“I’ll tip you $1000!”). They ran these experiments using two benchmarks: GPQA diamond (graduate-level multiple choice questions) and MMLU-Pro engineering tasks, and across several cutting-edge models outputting three key findings:
-
On aggregate, threatening or tipping models didn’t improve performance.
-
For individual questions, some saw up to a 36% accuracy gain, while others dropped by a similar percentage.
-
The effects were inconsistent and unpredictable.
In short: sometimes a jarring prompt helps; often it hurts—or does nothing.
What Does “Better” Even Mean?
That raises an important question: what does “perform better” actually refer to here? In normal AI interactions, “better” might mean more accurate answers, more information, or giving you what you asked for. Brin’s comment could describe any of those, but without benchmarks our ears tend to interpret “better” in general terms—such as responsiveness, clarity, or completeness. And the experiments show there's no consistent benefit across the board.
Deeper Implications & Ethical Concerns
Why would an AI model respond more effectively to aggressive prompts? One theory suggests that shocking language may force the model to break out of default, bland routines and produce more varied or confident outputs. But there's also a darker angle: encouraging threatening speech—even to machines—might normalize aggression and set a dangerous precedent in how we interact with systems.
Prompt engineering—crafting inputs to guide AI behavior—is already a patchwork practice. If certain emotional or violent contexts are more likely to produce desired outputs, it risks reinforcing unsafe prompting techniques.
The Human Side: Prompt as Psychology
Humans often associate different tones with different expectations. A stern tone might get faster compliance than a casual one. If AI models detect and mimic these patterns (even subtly), they may mirror human Pavlovian responses to tone—even if just through statistical associations learned during training.
Wrap-Up
-
Sergey Brin provocatively suggested that AI responds better to threats—even though it's socially awkward.
-
Empirical tests show that, while threats sometimes improve individual responses, they are largely inconsistent and unreliable in aggregate.
-
Encouraging or relying on aggressive prompting may be risky—both technically (unpredictable behavior) and ethically.
-
At its core, this insight challenges us to think more deeply about how AI models interpret tone and the limits of prompt engineering.
Comments
Post a Comment