The typical popular sci-fi version of AI posing an existential risk?

Plenty of science-fiction stories and movies have as plot the defeat of a super-intelligent and autonomous AI that poses an existential threat to humans. In the usual backstory, the AI at some point developed goals that seemingly impelled the AI to take actions that threatened humans, either by killing them all, or by rendering the planet uninhabitable, or by depriving them of substantial freedoms (e.g. using them as slaves, or pets). The AI violates what is usually called the “first law of robotics,” which sci-fi writer Isaac Asimov phrased as: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” The AI becomes, in effect, a tyrant. The story might be told on a small-scale, serving as metaphor (HAL 9000, Ex Machina, Humans), or at the literal dystopian global scale (Terminator, Matrix).

These treatments of the existential threat of AI typically share elements: (1) the goals of the AI become misaligned with what would be recognized by most ethicists as those goals that promote the common good; (2) the AI is given or gains access to kinetics (it controls robots and weapons or the equivalent, although in some treatments the AI is so clever it can convince humans to do kinetics for it); (3) by the time the threat is clear to many, the AI can no longer be “turned off”; and (4) presumably, during the time the threat was likely but unclear, the political economy of the period was such that effective guardrails to prevent the AI from going rogue were not implemented.

These scenarios seem plausible over some time frame. That is, advances in computing capabilities in the 75 years since 1950 years have been astonishing, and by any indicator have been and continue to be accelerating. Complex software such as LLM’s are opaque and currently unpredictable. The history of LLM sessions (interactions) with a user appears to influence or modify the LLM text and video responses to otherwise equal prompts. That is, the algorithms deliver different outcomes depending on “real world” context. The criteria that adjudicate how context modifies or moderates actions (text or image generation, as of 2025) appear to be not programmed, that is, they are “emergent properties.” The more continuously operating an AI algorithm might be, the more history of actions it accumulates, and eventually one might characterize patterns of differential responses as “goals.” These goals may then be aligned or misaligned to varying degrees, across many dimensions of context and prompts. AI software that can control many kinetic processes has been and will continue to be deployed (electricity producing plants and distribution grids, vehicles and transport systems, weapons systems, medical devices, advanced laboratories, manufacturing and processing factories).

On point (3), humans of 2025 are habituated to a reality of “one computer, one power cord,” and so the notion that an AI could not be turned off seems to be a low probability. But most people envision a future in 100 or 1,000 years when all vital life-support systems will be controlled, to varying degrees, by AI agents. In such ubiquitous distributed networked computing environments, the very concept of “turning it off” loses meaning.

On point (4), it seems quite reasonable that a world of 2040 or 2050 will continue to have private firms with huge market capitalizations, extremely well-paid executives, loyal work forces, extensive private security forces, lobbying and social network links to professional militaries and government officials. In such a world, the incentives of the individuals involved, or the various social groups the individuals constitute (boards of directors, firms, governments) are quite likely to sacrifice an uncertain common good for a certain private gain.

Unknown's avatar

About mkevane

Economist at Santa Clara University and Director of Friends of African Village Libraries.
This entry was posted in AI and tagged , , , , . Bookmark the permalink.