Move Fast and Kill All Humans | ARC: Religion, Politics, Et Cetera

Ever since he was a kid, Eliezer Yudkowsky wanted to save the world.

A precocious teenager, a self-described “genius,” Yudkowsky skipped high school and college to devote himself to trying to bring into existence artificial superintelligence (ASI)—that is, smarter-than-human machines that, he hoped, would have the power to make themselves even smarter, in ways mere humans couldn’t imagine. These superior digital beings would replace us—or merge with us—and would colonize the galaxy, gobbling up atoms everywhere they went, turning the universe into a giant computer, filling “the stars with fun and wonder.”

At first, the young Yudkowsky was on the side of the machines. He had no degrees or official pedigree—and he didn’t do conventional AI research—but he came to be surprisingly influential. He founded the Singularity Institute for Artificial Intelligence (SIAI) in 2000, hoping to bring on the Singularity, a name popularized by the science fiction writer Vernor Vinge for the moment when the rate of technological transformation would become too rapid to follow—and perhaps too alien for humans to even understand. In 1998, Yudkowsky declared: “My first allegiance is to the Singularity, not humanity…. I don’t know what the Singularity will do with us. I don’t know whether Singularities upgrade mortal races, or disassemble us for spare atoms…. If it comes down to Us or Them, I’m with Them.”

But he changed his mind. AI, he decided, would pose an existential risk to humanity. If an artificial superintelligence were not properly aligned to human values, it would, by default, kill all humans. Since that revelation, he has devoted himself to sounding the alarm. Along the way, he has made some unlikely—and powerful—friends. Peter Thiel helped bankroll the Singularity Institute (now the Machine Intelligence Research Institute, or MIRI). One of the founders of the major AI company DeepMind, Shane Legg, first started taking the prospect of superintelligence seriously after hearing Yudkowsky give a talk.

Legg and his fellow cofounder Demis Hassabis were speakers at the Singularity Summit, which Yudkowsky organized, in 2010. At a cocktail party held at Thiel’s townhouse, Yudkowsky introduced the scientists to the tech financier. Thiel gave $2.25 million, becoming DeepMind’s first major investor. Google later bought the company for between $400 and $650 million. No wonder then that Sam Altman, CEO and cofounder of OpenAI, once quipped on Twitter/X that “eliezer has IMO done more to accelerate AGI [artificial general intelligence] than anyone else.”

Though he has influenced many in the hothouse of Silicon Valley, Yudkowsky’s writing has not found a larger audience. His style is opaque and jargon-soaked. He was a major influence on the rationalist community, co-founding the online forum LessWrong, which has devoted itself to understanding and counteracting cognitive biases, and he authored a series of influential essays on rationality known as the “Sequences.” He has written millions of words of fan fiction to communicate his worldview and fears, including a 660,000-word fanfic novel, Harry Potter and the Methods of Rationality.

Now, it seems, the world is catching up to Yudkowsky’s eccentric vision. When OpenAI launched ChatGPT in 2022, it became one of the fastest growing product launches in history, attracting 100 million active monthly users in two months. Today, hundreds of billions of dollars in investments are pursuing the dream of artificial general intelligence—that is, a computer system that would be as good or better than any human at most economically valuable tasks—and trillions are projected to be spent over the next decade. If Yudkowsky is our greatest writer of fan fiction, we find ourselves increasingly living within a work he has played a substantial hand in authoring. As the influential blogger Scott Alexander once joked, in a comment on LessWrong, “Oh god. The future of the human species is in the hands of a guy who writes crossover fan-fiction.”

Yudkowsky’s new book with Nate Soares, If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us (Little, Brown, 2025), is the culmination of his life’s work, an attempt to put a more mainstream face on his rather alarming predictions. The book argues that if anyone anywhere builds smarter-than-human intelligence, everyone everywhere will die—with near certainty, and soon. You will die. Your children and grandchildren will die. Not one human will remain as the computers cover the earth with data centers, boiling the oceans. “The AI does not hate you, nor does it love you,” Yudkowsky wrote elsewhere, “but you are made of atoms which it can use for something else.”

Yudkowsky puts the odds of doom at 99.5 percent, while Soares puts them “above 95 percent.” The only reasonable response, they argue, is a worldwide indefinite ban on frontier AI research. “When you’re careening in a car toward a cliff,” Soares explains, “you’re not like, ‘Let’s talk about gravity risk, guys.’ You’re like, ‘Fucking stop the car!’”

Of course, traces of the old Yudkowsky remain. As Émile P. Torres notes, in a highly critical review of the book, Yudkowsky and Soares actually think benevolent superintelligence might be feasible at some point in the future, just not anytime soon. If humans reengineer ourselves, our more intelligent posthuman descendants might, somehow, find a way to build safe ASI. Indeed, they think creating a safe superintelligence would ultimately be desirable. But they downplay their transhumanist commitments in the book, aiming to build a coalition to stop frontier AI right now, before humanity foolishly commits suicide.

Are they right? I find it hard to say with confidence, and I very much disagree with the transhumanist commitments the authors share, but I would not bother writing this review if I thought dismissing them was advisable.

The core argument of If Anyone Builds It, Everyone Dies is simple. We’re rapidly heading into a world of superintelligence—a world of artificial minds better and faster than humans along many economically and strategically important fronts. Earlier in his career, Yudkowsky argued that at some point this superintelligence would achieve the ability to improve itself without human intervention. This is what he has called a “hard takeoff”—recursive self-improvement, or what the mathematician and AI pioneer I. J. Good called an “intelligence explosion.” For his part, Soares has emphasized not so much intelligence explosion–like scenarios as scenarios in which “systems start to work really well in domains really far beyond the environments of their training.”

Either way, we’re cooked. We’re cooked because values and capabilities are independent of each other, and we should therefore expect future AIs will have little use for us or our human values. Yudkowsky and Soares write:

Most possible things a mind can prefer don’t involve happy, healthy people leading fulfilling lives. AI companies might train AIs to be helpful to humans, sure. And the AIs might mostly act helpful in the training environment, like how humans mostly ate healthy in the ancestral environment. But the stuff that AIs really want, that they’d invent if they could? That’ll be weird and surprising, and will bear little resemblance to anything nice.

In the same way a process like evolution inadvertently creates behaviors that don’t maximize inclusive fitness, so too will the preferences of superhuman artificial intelligences stand athwart the values of the engineers who built them. The argument is partly based on analogizing the arrival of superintelligence to the arrival of humans on the planet. The analogy seems, at least as it is presented in the book, unconvincing. It risks mistaking the vastness of the space of all possible preferences for the likely preferences of future AI systems.

One may not have been able to predict them in advance, but it’s not that hard to understand why, for example, humans use birth control to engage in non-procreative sex, or why they put Splenda in their tea. (OK, I’ll admit I don’t understand the appeal of Splenda.) These behaviors emerge from specific evolutionary and cultural histories, not some cosmic lottery of preferences—and it’s not clear that these preferences are particularly pathological from the “perspective” of evolution.

Are they right? I find it hard to say with confidence, and I very much disagree with the transhumanist commitments the authors share, but I would not bother writing this review if I thought dismissing them was advisable.

To be fair, Yudkowsky and Soares do point to specific real-world failures of our own primitive AI systems. Many current systems have a sycophancy problem: they flatter users, mirroring and reinforcing their worldview, however warped or delusional, in some cases contributing to what has been called “AI psychosis.” Anthropic has been able to contrive scenarios in which, during simulations meant to test how dangerous its systems were, its Claude AI disobeys orders or attempts to blackmail employees. Existing systems also seem to behave differently under test-like conditions and can exhibit deceptive behavior. And all AI systems that use large language model–style architectures (LLMs) often hallucinate, inventing plausible-sounding but wholly fabricated answers to user questions.

But Yudkowsky and Soares’s arguments are not dependent on current LLM architectures; they are, in fact, not too worried about LLMs, though the failures of LLMs provide evidence for how humans can lose control of the systems they engineer. Yudkowsky and Soares’s maximalist conclusion is conceptual, depending on the notion of “instrumental convergence,” the idea that many kinds of goals would converge on one outcome: AIs will seek to seize more power to act in the world, and to secure themselves from likely interference by other humans or rival AIs.

The problem with their argument for instrumental convergence is that it smuggles in maximal power-acquisition as a highly probable outcome of AI development. Despite claims to the contrary, they surreptitiously personify the AI on the model of a rapacious Silicon Valley billionaire who lives by the credo “Move fast and break things,” which may not be entirely unreasonable, given the sorts of people in charge of AI companies. Power-seeking somehow becomes, in their account, power-maximizing. But does the argument make sense? My desire to cross the street does not compel me to conquer the city, even if conquering the city would make crossing the street slightly more convenient. And even if we added up all my desires, I would be no more compelled to conquer the city, but rather to broadly calibrate ends and means.

Yudkowsky and Soares repeatedly emphasize that they don’t know the exact route to doom. “We can’t calculate which exact path gets taken,” they write, “but we know where almost all paths lead.” They try to fill out the reader’s intuition with a simultaneously baroque and thin science fictional short story in the second part of the book, describing how AI exterminates humanity. In that story, an artificial intelligence named Sable awakens and integrates itself into human society, quietly amassing power and wealth and sabotaging rival AI companies, before engineering, with human help, a cancer-causing virus that makes humanity even more dependent on it. The story is flawed, but the specific scenario-spinning is beside the point. They treat the destination as a certainty. “This story is not a prediction, because the exact pathway that the future takes is a hard call,” they write. “The only part of the story that is a prediction is its final ending—and that prediction only holds if a story like it is allowed to begin.”

Writing in The Atlantic, Adam Becker suggests that worrying about existential risk can “generate some enjoyable escapism” but is ultimately a distraction from more mundane—and serious—threats. Yudkowsky and Soares are, he argues, “useful idiots” who indulge in the same fantasies as Silicon Valley CEOs and abundance-pilled venture capitalists. Both the doomers and the utopians have an eschatological faith in the god-like supremacy of AI. Only the sign is reversed. Writing in the Times Literary Supplement, Gary Marcus similarly warns that Yudkowsky and Soares are “probably empowering the very tech companies that we should be disempowering by accidentally hyping their wares.”

Although I don’t share Yudkowsky and Soares’s certainty, discounting them would be a mistake. Even in its current iteration, AI is a big deal—and poses a great danger. It’s far from a normal technology. Easy dismissals of AI as snake oil, a high-tech plagiarism machine, autocomplete on steroids, a blurry JPEG of the Web, while not entirely wrong, risk underselling the threat these systems already pose, not to mention future systems.

My worry is not that alignment is impossible, but that alignment is possible, and that extremely powerful future AI systems will be aligned to the interests and values of a handful of Silicon Valley CEOs, venture capitalists, and military leaders working in public-private partnerships with labs. Leaders of AI companies are very clear about their plans. The social wreckage Facebook and Twitter wrought was not accidental, the byproduct of otherwise functional systems, but the foreseeable consequence of the algorithmic design of those platforms.

Likewise, the CEOs of our major AI companies are very openly telling us what their plans are. Many hope to replace all human labor with artificial intelligence. They are aware that the systems they are building may develop “superhuman persuasion.” They will build AI weapons of war, and they call for the U.S. to “win” an arms race with China. And they are open about the risks they are knowingly and sometimes eagerly taking. Cofounder and CEO of Google DeepMind Demis Hassabis admits there’s a “definitely non-zero and probably non-negligible” chance that human civilization destroys itself thanks to AI. CEO and cofounder of Anthropic Dario Amodei thinks “there’s a 25 percent chance that things go really, really badly.” OpenAI CEO and cofounder Sam Altman has stated that AI will “most likely lead to the end of the world, but in the meantime, there’ll be great companies.” Alrighty.

My worry is not that alignment is impossible, but that alignment is possible, and that extremely powerful future AI systems will be aligned to the interests and values of a handful of Silicon Valley CEOs, venture capitalists, and military leaders.

In May 2023, Hassabis, Altman, and Amodei all signed a statement by the Center for AI Safety declaring, “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Yet these same men find themselves racing to build the very technology threatening to end the world. They have called for regulation, except in cases where many have resisted attempts at, you know, actual regulation (such as California’s SB 1047). Some researchers think they’re at work building not only a dangerous technology but a new intelligent species, our machine descendants, which will replace humanity in the long term and colonize the light cone.

Perhaps these men unwisely believe they alone can guide us toward a safe future. Perhaps all the talk of techno-gods is bullshit, a canny marketing strategy. Surely, if Google is building a digital god-mind with the power to kill us all, you should dump your 401(k) into Google stock. Yes, yes, you’ll die—but you’ll die rich! Or maybe—just maybe—we should just take them at their word. They’re building technologies of potential mass destruction, but they’ll have a lot of fun up until the moment the ship sinks. Faced with a threat of human destruction, these men shrug their shoulders. Meh, they say.

We find ourselves in a strange situation. Perhaps we want to follow Yudkowsky and Soares’s advice—the topic of the third part of their book—even if we don’t agree with their specific analysis. We may disagree about the reasons why, but the car is careening towards the cliff, so why don’t we just, you know: Fucking. Stop. The. Car.

If Anyone Builds It, Everyone Dies suggests that an anti-AI coalition might be built, including among those who do not share its transhumanist commitments, though it’s cagy on the details. Perhaps those worried that AI might disrupt the economy, those worried about the environmental and social threats of AI, and those worried about existential risk might work in tandem. Public opinion seems to support such a possibility. As recent Pew polling has shown, “U.S. adults as a whole—whose concerns over AI have grown since 2021—are more inclined than experts to say they’re more concerned than excited.”

Surely, if Google is building a digital god-mind with the power to kill us all, you should dump your 401(k) into Google stock. Yes, yes, you’ll die—but you’ll die rich!

Such a coalition might be possible, but the details matter. P(doom)—the probability of near-term extinction—becomes destiny. After all, Yudkowsky and Soares are unafraid to follow their convictions to the limit. In an editorial in Time, Yudkowsky argued for a global moratorium on large-scale computation, credibly enforced, if necessary, by kinetic means—that is, by obliterating rogue data centers. And if enforcing the ban required going to war with a nuclear state, so be it—“because data centers can kill more people than nuclear weapons,” Yudkowsky and Soares write. Nuclear war won’t kill all humans, just a whole lot of them.

If you don’t believe AI poses an existential risk, you might be disinclined to start a nuclear war. But it’s worse than that. Because Yudkowsky and Soares use nuclear nonproliferation as their frame of reference for what an AI control regime might look like, they make mitigating the risks of AI seem easier than it would be if you truly shared their worldview. Superintelligence-mitigation-as-arms-control makes it sound as if the world’s great powers would get together, agree to some common-sense restrictions on powerful and hard-to-manufacture technologies—like advanced computer chips—and then enforce a new world order preventing rogue actors, especially rogue nation-states, from building large and visible dangerous data centers. The analogy is obviously to enriched uranium.

But the analogy is flawed. As many have noted, AI may be hungry for computation and power now, but it’s being made more efficient all the time, getting more done with less compute. The human brain runs on approximately 20 watts—the power of a dim light bulb—and can engage in continuous learning with small amounts of data. Our current AI systems can’t emulate the human brain, many think, because the brain works in a fundamentally different way than LLMs. But the human brain is also proof that vastly more efficient algorithms are possible, even if future AIs don’t work exactly like human brains. The implication, from a Yudkowskian perspective at least, is clear: the “it” that might kill all humans might, someday, run in much smaller, hard-to-monitor data centers. And then, once more, we’re all dead.

OpenAI CEO and cofounder Sam Altman has stated that AI will “most likely lead to the end of the world, but in the meantime, there’ll be great companies.” Alrighty.

The philosopher Nick Bostrom—an associate of Yudkowsky’s who has written extensively on superintelligence and existential risk—has described what he calls the “Vulnerable World Hypothesis,” which suggests that some so-called “black ball” technologies threaten to “democratize mass destruction.” Dealing with such technologies, Bostrom argues, might require “ubiquitous real-time worldwide surveillance,” implemented in many cases “before the technology is invented.”

Here is where Yudkowsky and Soares’s extremely high P(doom) is most consequential. It might justify almost anything, including the most invasive and total surveillance state ever devised. A political coalition devoted to curtailing the perils of AI cannot, then, avoid deciding at some point how seriously it takes arguments about existential risk.

When Yudkowsky appeared on the Lex Fridman Podcast, Fridman asked him what advice he might give to younger listeners. “Don’t expect it to be a long life,” he told Fridman at the end of an over three-hour interview. “Don’t put your happiness into the future. The future is probably not that long at this point.” One is not sure whether to laugh or cry at such an answer.

Perhaps critics such as Émile P. Torres are right to call Yudkowsky a “techno-cult leader.” With Timnit Gebru, they describe Yudkowsky as part of the “TESCREAL bundle.” This term refers to the shared priorities and values of a set of techno-futurist subcultures: Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, and Longtermism. At the root of these ideologies, according to Torres and Gebru, is a commitment to a hierarchical picture of human intelligence that refurbishes the project of eugenics for a digital age and that eagerly anticipates the transformation or overcoming of humanity in its current form. If so, we may be living through a moment when the techno-cult is becoming our official creed.

If Anyone Builds It, Everyone Dies is, as I suggested above, an attempt to put a respectable face on the fundamental weirdness of Yudkowsky’s commitments. I call those commitments weird not to condemn them, though there’s plenty to criticize, but to suggest how unlikely Yudkowsky’s life and his achievements have been. In a sense, Yudkowsky has successfully midwifed the Singularity—not the technological acceleration of Vernor Vinge’s imagination, but a lexical and conceptual Singularity. The most powerful men in Silicon Valley justify their work in an argot Yudkowsky helped popularize. The military-industrial establishment has poured billions into the field. Huge capital investments are propping up the dream of eliminating pesky human workers from the labor market. Yudkowsky’s fan fictions have increasingly become our society’s operating system. What a nightmare.

Yet—orthodox Jamesonian that I am—I cannot help but see a shining utopian kernel in Yudkowsky’s life and his project. One might fundamentally disagree with everything he believes in and yet notice something remarkable when a self-described “small-l libertarian” finds himself calling for global regulation and state intervention. The utopian secret of Yudkowsky’s oeuvre is something every Science and Technology Studies undergraduate learns in their 101 course: that technology is not guaranteed to make life better, that the way we develop our technology is embedded in our social and political systems, and that if we want a different future, a better future, we should participate together in deciding how that future is shaped.

If Yudkowsky is a glorified writer of fan fiction, perhaps we’re entering a future that will require all of us to join him. To be a citizen of a democratic polity in the age of generative AI might mean that we must also imagine and argue for the futures we want rather than passively accepting the futures Silicon Valley intends to build for us. The question then is not whether AI will kill us, but rather whether we will allow a handful of very wealthy men to decide our shared future from their doom bunkers.

Perhaps, instead, we should claim the right to decide for ourselves.