AI extinction risk is just a distraction from present day harms.
It’s not just AI companies talking about extinction risk. AI companies only began admitting openly to this risk after years of pressure from external, academic and research groups.
Present day harms and existential risks are part of the same fight. Both issues result from AI companies racing to develop powerful tools without taking responsibility for their effects. We need to work together to hold companies accountable.
You’re asking for a lot.
That’s true. Significant problems require significant solutions. AI companies themselves say extinction risk should be a global priority like nuclear war and pandemics. There is precedent for how we handle these extreme risks, and we’re asking to actually hold AI to that standard.
Regulation hurts innovation.
Unreliable AI isn’t going to be very useful. We need safety to maximize AI’s benefit.
We only want to stop the companies risking harms that far outweigh the benefits – that’s an extremely narrow subset of all AI work.
There are so many AI research directions that can transform the economy without threatening humanity. Using narrow AI to solve science problems like AlphaFold is awesome – let’s put our resources there.
Regulating AI is infeasible.
The president, prime ministers, congress, and parliaments are actively discussing not if but how to regulate AI.
We’ve coordinated internationally on nuclear nonproliferation and biosecurity. We’ve carefully managed or outright blocked atomic energy, human cloning, and gain of function research. We can do this all again for AI.
Regulation might make things worse, because governments are bureaucratic and don’t understand the state of the art.
Companies have shown themselves to be unable to responsibly handle risks on their own. These kinds of scenarios, where corporations are competing for their own myopic gain at the expense of the general public, are what governments and regulations exist to mitigate.
Of course, governments get AI and AI alignment experts’ input on regulation. However, they must be careful to prevent regulatory capture (companies pushing for self-serving regulation).
“AI development could be good”
If we don’t build AGI, other less cautious or worse intentioned actors will.
AGI is dangerous mainly because no one knows how we will control it, including the allegedly cautious companies. If they want to prevent this dangerous technology from harming people, racing to build it themselves is suicidal.
So far, the most worrying development has come from the allegedly cautious companies. Smaller actors are largely copying them. If these companies want to avoid others building AGI, they should stop giving them instructions.
We need to study state-of-the-art AI to learn how we could eventually align AGI.
Actually, we don’t. Interpretability researchers have barely begun to understand GPT-2-level systems, let alone the state of the art. There are countless other open questions in theoretical and empirical research we could spend decades making progress on, without needing giant, dangerous models.
Deploying AI now, and potentially experiencing small catastrophes caused by it, will help society prepare and increase the likelihood of a sufficient government response.
Building tech that will cause catastrophes in the hopes that it will cause a wake up call about even worse catastrophes is reckless. While people might be likely to react after catastrophes, we still want to do everything we can to avoid catastrophes and push for good governance now.
AGI development has benefits.
Under current conditions, AGI development poses a significant risk of killing everyone. That risk outweighs potential benefits. In fact, it means we’ll probably be prevented from ever realizing most of AIs benefits. If we want to benefit from AGI, it needs to be developed under safe conditions.
AI companies are building AI because they’re trying to make the future awesome.
Regardless of their intentions, these companies are creating immense risk for humanity that far outweighs potential benefits. If they want the future to be awesome, they should redirect their efforts; there are so many important problems they could be working on solving.
Other AI development has benefits. We shouldn’t stop other AI to stop AGI.
It will get easier to develop AGI over time, as costs go down and researchers make algorithmic progress. If we want to ensure AGI can’t be built under unsafe conditions, we need to ensure there’s no uncontrolled access to means to build it.
“AI isn’t that dangerous” and “alignment isn’t likely to be a problem”
Language models don’t really understand things. They’re just doing autocomplete.
Empirical evidence suggests AIs are developing real-world understanding. Language models are trained to predict text written by humans. This incentivizes AIs to understand human reasoning.
Language models aren’t agents; they can’t do complex planning and take actions.
DeepMind has worked on generalist agents; OpenAI has also been reported to be raising money to develop “artificial general intelligence that is advanced enough to improve its own capabilities”; AutoGPT – a tool that enables GPT to interact with itself and take real-world actions – is the most popular repository on Github to date. In short, not only are complex planning and action-taking abilities expected to develop eventually, but people are already actively trying to improve them.
AIs can never be smarter than humans.
There’s no law of physics that says human minds must be the smartest. AIs enable better designs for minds. They’re less constrained by how much information they can process (they can read the entire internet) or how big they can be (they don’t need to fit in a skull), and they can be incredibly efficient. Accordingly, AIs have already blown past human levels in many domains.
AI will never be powerful enough to kill everyone.
Humans have built nuclear weapons. AIs much smarter than humans could take similarly dangerous actions. AI godfather Geoffrey Hinton warns: “If it gets to be much smarter than us, it will be very good at manipulation, because it will have learned that from us … It’ll figure out ways of manipulating people to do what it wants.”
It will take a long time for AI to go from smarter than humans to godlike.
Once you’re smart enough to do science efficiently, you can rapidly make yourself even smarter, which is useful for achieving almost any goal an AI could have. As Alan Turing’s colleague I.J. Good put it: “An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind.”
Why would AI hate humans?
Godlike AI doesn’t need to hate humanity to kill everyone. It just needs to want something besides human interests, then powerfully execute on that want. “Humans don't generally hate ants, but we're more intelligent than they are – so if we want to build a hydroelectric dam and there's an anthill there, too bad for the ants” (FLI).
Why would AI want to kill everyone?
For almost every task we might train AIs to accomplish, it helps to acquire resources that help them accomplish it (like more computational power) and to avoid obstacles (like being modified or shutdown). Given this, AIs trained to be generally useful will likely learn these power-seeking tendencies. As Geoffrey Hinton writes, “Having power is good, because it allows you to achieve other things… One of the sub-goals [AIs] will immediately derive is to get more power”. In this sense, AIs won’t want to kill everyone. They’ll want other things we didn’t have reliable control over, and the result of them trying to get that something-else will be catastrophic for us.
As AI gets smarter, won’t it realize what’s moral?
There exist brilliant psychopaths, who skillfully execute plans but who never see violence as bad. There similarly can exist brilliant AIs, who vastly outperform humans but who never develop good values.
Can’t we just train AIs to know what’s good?
Training AIs to behave as intended is difficult even with current systems. Researchers usually try to get smart AIs to act certain ways – for instance, kindly – with reinforcement learning from human feedback. Roughly, this process reinforces outputs we like and penalizes outputs we dislike, then hopes this teaches them to do what we like. However, getting to act some way is not the same as getting it to be that way. Researchers have no way of checking an AI’s internals to see if it learned to be kind or something else that scores similarly well like “be good at predicting what the humans like”. It’s hard to tell whether an AI has learned the intended goal just by looking at behavior. Furthermore, when researchers observe behavior for long enough, we often realize AIs learned unintended goals as feared.