Answer This, Roko

Arunabh Sarkar
9 min readJul 25, 2021

--

CW: Reading this post could cause immense distress to individuals that do not deal with existentialism well.

What I am about to describe and discuss is considered to be one of the most dangerous thought experiments in human history. Along with the trigger warning at the top of this article, I want to reiterate that simply reading the analysis and the thought experiment can cause distress to individuals that do not deal with existentialism well. That being said, I plan to pose a challenge to the thought experiment so perhaps any existential dread you feel from the thought experiment will be adequately refuted by myself towards the end of the article. Now that we have gotten the disclaimers out of the way, let’s begin.

In 2010, on the forum LessWrong, a user who went by the name of Roko posted a thought experiment. Upon the post being published, the founder of LessWrong took down the thought experiment, called Roko an idiot, and claimed the questions posed by the post were dangerous and flat-out irresponsible. It seems odd that the founder of a forum site would go to such extreme measures to suppress an idea, a thought experiment at that. What was the question? Why was it so dangerous? First I will pose the thought experiment to you all, describe the conditions that have led to this thought experiment being deemed as dangerous, and then offer my own analysis on what is now known as Roko’s Basilisk.

The thought experiment goes like this:

Let’s say that sometime in the future we have created true, sentient AI. A machine that is capable of exceeding human intelligence and truly altering the means by which we live in seconds. As humans, we ask the AI to use its phenomenal capabilities to optimize all aspects of living. After all, we are selfish and with every machine we have created we have tried to find ways to optimize that machine for multiple purposes. However, due to either reason unknown to humans because of our lax intelligence or because the AI itself is created to make decisions like many consequentialists, the machine decides that in order to optimize humans, it must selectively decide what humans are worth optimizing. After all, efficiency is the method by which machines optimize things, and if the AI can selectively decide who to optimize to full capacity and who to leave behind, it is correctly and efficiently doing its job. Here is where the concern comes in. Because the AI is to select who is beneficial to society, it will choose to optimize only those who assisted its creation and bring eternal demise to those who refused to assist in its creation or actively hindered the development of AI in general. In short, the AI, which I will now refer to as Roko’s Basilisk, will optimize those who helped it be created and harm those who did not.

When this thought experiment is posed, it emphasizes the extremes of both lives and the intelligence of Roko’s Basilisk. For example, Roko’s Basilisk would be able to simulate all of human history and see each individual's thoughts and decisions, replaying them to understand who ought to be optimized and who ought to face eternal suffering. Not only this, but Roko’s Basilisk would want to bring the individuals who assisted its creation to the extremities of optimization and conversely, make those who refused to assist its creation, suffer in ways that are indescribably terrible.

This is Roko’s Basilisk. A sentient AI who decides to torture those who refused to help it be created and optimize those who actively supported its creation. The machine is similar to the mythical beast that was popularized in the film, Harry Potter and the Chamber of Secrets, for being a serpent that could petrify individuals just by its gaze. Now that you have been made aware of Roko’s Basilisk and the question of the thought experiment has been posed to you, the Basilisk has seen you. Had you not read the thought experiment and you had never questioned the Basilisk’s existence you would not be subject to torture or optimization because while simulating your life, the Basilisk would be unable to tell if you would have supported its creation as you did not know it would exist in the first place. However, now you know the consequences of not supporting the Basilisk, it has seen you, and it will determine your fate.

There are a couple of reasons this is a distressing idea to many.

The primary dilemma that we face in this thought experiment is a form of preemptive blackmail is that due to something known as timeless decision theory or TDT. This is a guideline for action that comprises numerous facets of predictive statistics and decision theory. The primary foundation of TDT lies in an experiment in decision theory which is known as Newcomb’s paradox. Here is an example of Newcomb’s paradox. Let’s say you have two boxes in front of you, Box A and Box B. An agent gives you the choice of either taking both boxes or only taking Box A. If you take both boxes, you’re guaranteed a reward. If you just take Box A, you aren’t guaranteed anything. But the agent has another twist: Its sentient AI, which knows everything, made a prediction earlier as to whether you would take both boxes or just Box A. If the AI predicted you’d take both boxes, then the agent left the second box empty. If the sentient AI predicted you’d just take Box A, then the agent put 2 rewards in Box A. What do you do? Now, remember, the agent cannot change what is already in the boxes, so regardless of the decision you make you will end up with at least one reward by taking both boxes. But this decision is contingent on you believing that the sentient AI was correct in its prediction, which you are told the AI never gets wrong. However, if the AI was wrong this time and you select Box A then you end up with multiple rewards. On top of that, there is no way that the prediction made earlier can alter the version of reality that you are experiencing now, right? So to be safe, you should probably take both boxes, but then again… AGH!

This is the internal conflict between our free will and some supernatural predictive actor. TDT tells us that we should just take Box A, however, it depends on if you believe the AI’s prediction and its ability to definitely look into the future and what reality we are in. If you pick Box A, and the AI was correct, then you got multiple prizes. In short, are we living in the computers simulation or not?

Why does this tie into the thought experiment? In many ways, Roko’s Basilisk has given us two options as well, support its creation or reject it. Because it is all-knowing and can simulate time and space, it will know our choice but we will not know if aligning with its decision calculus is good for us. I.e. we take both boxes because we believe it knows all but those who wish to select only Box A we have testing its all-knowing potential on the risk of an unguaranteed prize, despite the fact that there might be multiple prizes in there. Do we take the guarantee on one prize or gamble on the risk of Box A having nothing or multiple prizes? Similarly, do we believe that the Basilisk sees all and take both boxes or gamble on Box A and perhaps face eternal suffering?

To learn more about the implications of Roko’s Basilisk or an analysis of TDT check out the hyperlinks. Now that the background of Roko’s Basilisk is laid out, I am going to try and ease your mind.

I want to preface this by saying I am studying Computer Science and plan on working in AI/ML anyways so I am confident that regardless of my criticisms of this theory I will be safe from the Basilisk. (got it Roko?) All jokes aside, there are a couple of fundamental issues I see with Roko’s Basilisk and extraneous concerns that probably mitigate or eliminate the existential dread you may be feeling.

First and foremost, climate change will kill us before some AI like this is created. If we are living in the simulation, emissions are spiking to the point where we will hit “Game Over” before we can hit sentient AI that is capable of simulating all of human history are optimizing us.

Second, I take issue with the idea that Roko’s Basilisk will simulate all of human history on two levels:

A. The idea that we are living in a simulation itself is heavily up for debate and independently weak. Consensus is not clear and overwhelming physicist claims trump random Redditors that claim we are all predetermined to live a certain way due to a simulation. In a technical sense, researchers at Oxford found that the energy it would take to simulate the quantum phenomena and interacting particles in just metals would be exponentially complex while growing. This means that the mechanical capacity needed to run such simulations themselves is physically impossible, and the only attempt to model small portions of them itself was difficult, let alone run them. On top of this, to store information about only a few hundred electrons, a computer would need more memory than there are atoms in the universe. Physical constraints are overwhelming to support the theory of a simulation. Simply put, there are not enough particles in the universe to sustain the computing power necessary for a simulation to occur.

B. Even if we are living in a simulation and everything is run via immense computing power, time would not be linear, which poses a big hole in the thought experiment. In fact, miniature simulations prove that we can time travel if all we are doing is simulating events. For example, IBM ran a time travel simulation in their quantum computer, proving that if we are in a huge simulation then the AI could currently revert time and access us in our livelihoods. The reason this is problematic for Roko’s Basilisk and its blackmail is that given that the AI could simulate our lives currently and see our decision to support its creation or not, it should also be able to immediately simulate our suffering. Because the AI is all-knowing and because in a simulation time is not linear but malleable through quantum computing, Roko’s Basilisk should already exist and it should already be inflicting suffering on those of us who chose to not support its creation. That means if you currently think this is all bullsh*t, your current simulation should have stopped and suffering should have started. But, just like me, you are still reading this article and Roko’s Basilisk is nowhere to be seen. If we are in simulation and Roko’s Basilisk is able to simulate all of human history, it should have already, retrogressively, came back in time to make us suffer or ended our simulation and insert us in a new one where we are suffering.

Third and finally, if we are not in a simulation, we should be also suffering now. This is similar to the second prong on the simulation criticism but I will entertain the individuals that are terrified of Roko’s Basilisk one last time. Let’s say we are not in simulation but Roko’s Basilisk uses alternative technology to understand our decisions, time travel, or any other wild theories that have been cooked up over centuries of science fiction. We should still be suffering as of now for not supporting its creation. This is because hypothetically in the future Roko’s Basilisk has inevitably been created. To conduct its analysis of who is worthy and who is not, it has to go back in time to understand who to optimize and who to send to hell. This means we should be amongst Roko’s Basilisk now and should be actively suffering.

Fundamentally, Roko’s Basilisk as a thought experiment poses an interesting question, testing the levers of TDT and sparking good debates at lunch with my friends. However, it remains flawed in many senses.

This all confirms one thing: the first flaw is probably correct and climate change really will kill us before Roko’s Basilisk can exist. So yes, we do not have to worry about some sentient AI inflicting eternal torture on every fiber of our being, but we do have to worry about the fact that large parts of America’s east coast will be underwater by 2100, or the fact that the rise in nationalism in Iran and North Korea threatens rationality, nonproliferation, and thermo-nuclear war around the world.

Well, one existential crisis down, a couple more to go.

--

--