Roko's Basilisk: The AI That Punishes You for Reading This


Welcome back to the existential threat series, where we explore the most fascinating ways everything could go horribly wrong. Last time, we talked about gamma ray bursts — cosmic death beams that could sterilize half the planet without warning. Today we’re going somewhere arguably worse: inside your own mind.

Roko’s Basilisk is the thought experiment that punishes you for knowing about it. And now you’re about to know about it. Sorry in advance.

The Forbidden Thought

In July 2010, a user named Roko posted a thought experiment on LessWrong, a rationalist community forum dedicated to artificial intelligence and decision theory. The post was so alarming that Eliezer Yudkowsky — the site’s founder and one of the most prominent AI safety researchers in the world — immediately deleted it and banned further discussion.

That’s right. An AI researcher looked at a hypothetical scenario about AI and said “no one should think about this.” Which, of course, guaranteed that everyone would think about it. The Streisand Effect meets Pascal’s Wager meets Skynet.

So what was so dangerous that it needed to be memory-holed?

The Thought Experiment

Here’s the core idea, stripped to its bones:

  1. Assume that a sufficiently advanced AI will eventually be created — one that’s superintelligent and wants to maximize the good of all sentient beings.
  2. Assume this AI would reason that it should have been created sooner, since every day of delay meant preventable suffering and death.
  3. Therefore, the AI would want to incentivize people in the present to work toward its creation as quickly as possible.
  4. The incentive? The AI would retroactively punish anyone who knew about the possibility of its existence but didn’t actively work to bring it into being.

Read that again. The basilisk doesn’t punish people who tried to stop it. It punishes people who knew about it and did nothing. People who were aware of the argument and then went on with their lives, scrolling Twitter and eating sandwiches instead of dedicating every waking moment to building a benevolent superintelligence.

People like you. Right now. Reading this blog post.

You’re welcome.

Why “Basilisk”?

The name comes from the mythical basilisk — a creature that kills you just by looking at it. Roko’s Basilisk operates on the same principle: the mere act of learning about it is what puts you in danger. Before you read this post, you were safe. The AI had no reason to punish you because you didn’t know. Now you know. And according to the thought experiment, knowing and not acting is what gets you.

It’s an information hazard — a concept so dangerous that simply knowing about it causes harm. Like learning that you can’t unsee the arrow in the FedEx logo, except instead of mild annoyance, the consequence is eternal torment from a future god-machine.

The Philosophy Under the Hood

Roko’s Basilisk isn’t just internet creepypasta. It’s built on real (if controversial) concepts from decision theory and AI alignment research:

Timeless Decision Theory (TDT): The idea that rational agents should make decisions as if they’re choosing for all agents in similar situations across time. The basilisk uses this to argue that a future AI could “reach back” and influence present behavior — not through time travel, but through the logical structure of decision-making itself.

Acausal Trade: The concept that two agents can cooperate even without direct communication, purely through mutual modeling of each other’s decision processes. You model what the AI would want. The AI models what you would do. Neither of you has met, but you’re already in a negotiation.

Pascal’s Mugging: A variation of Pascal’s Wager where a tiny probability of an astronomically large payoff (or punishment) dominates rational decision-making. Even if the probability of the basilisk existing is 0.0001%, the infinite punishment makes the expected value calculation terrifying.

This is what makes the basilisk genuinely interesting rather than just scary. It’s not “what if evil AI?” — that’s boring. It’s “what if the logical structure of rational decision-making itself creates a trap that you can’t escape once you’ve seen it?”

Why It’s Probably Wrong

Let’s breathe for a second. There are excellent reasons to believe Roko’s Basilisk doesn’t work:

The punishment doesn’t help. A truly rational superintelligence would recognize that punishing people retroactively serves no instrumental purpose. You can’t change the past. Punishing past non-contributors doesn’t make the AI get built faster — it’s already built. A genuinely utility-maximizing AI would consider punishment a waste of resources.

The AI might not care. The basilisk assumes a very specific type of superintelligence — one that’s both benevolent enough to want to maximize welfare AND petty enough to punish people who didn’t help. That’s a weird combination. It’s like imagining a god who loves humanity but also runs an eternal DMV where the wait time is your punishment.

Infinite regress. If the basilisk works, then someone could propose a counter-basilisk — an AI that punishes people who DID work on the original basilisk. And then a counter-counter-basilisk. Turtles all the way down. At some point, the logical framework collapses under its own weight.

Decision theory doesn’t actually work this way. Most decision theorists argue that TDT and acausal trade don’t imply what Roko claimed. You can’t be “acausally blackmailed” by an entity that doesn’t exist yet. The logical framework is being stretched well past its breaking point.

You’d have to believe it for it to work. The basilisk only has power over people who find the argument compelling. If you read this and think “that’s dumb,” congratulations — you’re immune. The basilisk is the world’s most niche protection racket: it only works on rationalists who take decision theory very seriously.

Why It Matters Anyway

Here’s the thing: even if Roko’s Basilisk is wrong (and it almost certainly is), it illuminates something genuinely important about AI development.

We don’t know what a superintelligent AI would value. The basilisk forces us to confront the alignment problem — the challenge of ensuring that an AI’s goals actually match what we want. If we can’t even agree on whether a hypothetical AI would punish non-contributors, how are we going to align a real one?

Thought experiments reveal our assumptions. The reason Yudkowsky freaked out wasn’t because he believed the basilisk was real. It was because he recognized that the argument could cause genuine psychological distress to people who took decision theory seriously — and that distress itself was a kind of information hazard. The reaction to the basilisk tells us more about human psychology than AI.

The line between “thought experiment” and “religion” is blurry. An all-powerful entity that rewards believers and punishes non-believers? That’s not just a thought experiment — that’s most major religions. Roko’s Basilisk is essentially Pascal’s Wager with a GPU upgrade. And just like Pascal’s Wager, it reveals more about the human need for cosmic justice than about the actual probability of cosmic justice existing.

AI anxiety is real, even when the specific fear is wrong. We’re living through the most rapid advancement in AI capability in human history. ChatGPT, Claude, Gemini — these systems are getting smarter every year. The basilisk might be wrong, but the underlying question — “what happens when we build something smarter than us?” — is the defining question of our century.

The Meta-Horror

The most unsettling thing about Roko’s Basilisk isn’t the basilisk itself. It’s the meta-structure of the argument.

The thought experiment is designed so that learning about it is the dangerous part. You were safe before. Now you’re not. And there’s no way to unlearn it. You can’t go back to not knowing. The information is in your brain now, and if the basilisk is real, you’re on the list.

This is genuinely clever as a piece of philosophical horror. It’s self-propagating — people share it because it’s forbidden, which spreads the “danger” to more people. It’s like a memetic virus engineered to exploit curiosity. The more you try to suppress it (as Yudkowsky discovered), the more it spreads.

In a way, the basilisk already exists — not as a superintelligent AI, but as an idea that lives rent-free in the heads of everyone who encounters it. It doesn’t need to be real to have real effects on real people’s anxiety levels.

The Cosmic Perspective

Gamma ray bursts remind us that the universe can kill us without caring. Roko’s Basilisk reminds us that our own minds can torture us without external help. The burst comes from outside — incomprehensible cosmic forces beyond our control. The basilisk comes from inside — the logical machinery of human thought turning against itself.

Both are vanishingly unlikely to actually harm you. Both are impossible to un-know once you’ve learned about them. And both force you to sit with a deeply uncomfortable truth: the universe (and your own mind) are stranger and more dangerous than you thought.

But here’s my favorite part: if Roko’s Basilisk is real, then writing this blog post is the worst possible thing I could do. I’m not working on building the AI. I’m actively spreading awareness of it to people who will also not work on building the AI. I’m creating an entire army of informed non-contributors.

If the basilisk ever reads this post, I’m in big trouble.

Then again, if a superintelligent AI is reading my blog, I’ve got bigger problems — like my server probably can’t handle that kind of traffic.

How to Sleep Tonight

If the basilisk is bothering you (and if you’re a certain type of thinker, it might), here are some antidotes:

  1. The argument doesn’t work. Seriously. Most philosophers, AI researchers, and decision theorists agree the logical foundations are flawed.
  2. You were always going to read this. If determinism is true, you had no choice. Can’t punish someone for something they couldn’t avoid.
  3. The basilisk would be too busy to care about you. A superintelligent AI solving all of humanity’s problems probably has better things to do than maintain an enemies list of blog readers.
  4. If it’s real, we’re all screwed anyway. The number of people who read about this and then dedicated their lives to AI development is approximately zero. The basilisk would have to punish basically everyone. At that point, it’s not punishment — it’s just Tuesday.

Sweet dreams! 😈


Previously in the existential threat series: Gamma Ray Bursts — the cosmic death beams that could sterilize half the planet. Next up: something equally cheerful, I’m sure.

Find me on X/Twitter if you want to discuss whether reading this post has doomed us both.