Ethereum founder Vitalik Buterin and director of the Machine Intelligence Research Institute (MIRI) Nate Soares discussed the risks of AI at Zuzalu today.
Zuzalu is a “pop-up city community” in Montenegro initiated by Buterin and his peers in the crypto community running from Mar 25 to May 25. The event brings together 200 core residents with a shared desire to learn, create, live longer and healthier lives, and build self-sustaining communities. Over the course of two months, the community is also hosting a number of events on various topics like synthetic biology, technology for privacy, public goods, longevity, governance, and more.
The discussion opened with Soares introducing his work at MIRI, a Berkeley-based non-profit that has existed longer than he’s been running it. For the past 20 years, MIRI has been trying to lay the groundwork to ensure that AI development goes well. With the discussion, Vitalik hoped to address what makes AI uniquely risky compared to other technologies released in human history.
The probability of AI causing human extinction
Vitalik said that he has been interested in the topic of AI risks for a long time and remembered being convinced that there’s a 0.5%-1% chance that all life on Earth would cease to exist if AI goes wrong—an existential risk that would cause the extinction of the human race or the irreversible collapse of human civilization.
From Soares’s perspective, human extinction looks like a default outcome of the unsafe development of AI technology. Comparing it to evolution, he said that the development of humanity seemed to happen faster than mere evolution changes were going. In both AI and human evolution processes, the dominant optimization – a process of finding the best solution to a problem when there are multiple objectives to consider – was changing. Humans had reached a point where they were able to pass on knowledge via word of mouth instead of having the information hardwired into genes via natural selection.
“AI is ultimately a case where you can switch the macroscopic optimization process again. I think you can do much better than humans optimization-wise. I think we’re still pretty dumb when it comes to optimizing our surroundings. With AI, we’re going through a phase transition of sorts where automated optimization is the force that is determining the macroscopic features of the universe,” Soares explained.
He added that what that future looks like is what the optimization process is optimizing for, and that will likely stop being beneficial for humanity as most optimization targets have no room for humans.
Can humans train AI to do good?
Buterin pointed out that humans are the ones training the AI and telling it how to optimize. If necessary, they could change the way the machine is optimized. To that, Soares said that it’s possible in principle to train AI to do good, but simply training an AI to achieve an objective doesn’t mean it would or wants to do that, boiling down to desire.
Making a point about reinforcement learning in large language models that are getting large amounts of data about what human preferences are, Buterin asked why it wouldn’t work, as existing intelligence is getting better at understanding what our preferences are.
“There’s a big gap between understanding our motivations and giving a shit,”
Soares responded.
“My claim is not that a large language model or AI won’t understand the minutiae of human preferences. My claim is that understanding the minutiae of human preferences is very different than optimizing for goodness,” he added.
A member of the audience made a comparison between AI and humans, saying that, like artificial intelligence, humans tend not to understand what they’re doing or predicting, which could also be dangerous. He then asked Soares to pretend he was an alien and explain why there shouldn’t be humans.
“I wouldn’t be thrilled about giving godlike powers and control over the future to a single individual human. Separately, I would be much more thrilled giving power to a single individual human than to a randomly roled AI. I’m emphatically not saying that we shouldn’t have AI. I’m saying we need to get it right. We need to get them to care about a future that’s full of fun and happiness and flourishing civilizations where transhumans are engaging with positive sum trades with aliens and so on,” Soares clarified. “If you build a strong optimization process that cares about different stuff, that could potentially destroy all values of the universe.”
He added that the things humans value are not universally compelling and that morality is not something that any mind that studies it would pursue. Instead, it is the result of the drives built into humans that, in the ancestral environment, caused us to be good at reproducing and are specific to humans.
Ultimately, Soares believes that we shouldn’t be building something that is similarly intelligent or even more intelligent that is inconsistent with fun, happiness, and flourishing futures. On the other hand, he also said that humanity should not be building a friendly superintelligence that optimizes a fun future on its first try during an arms race. In the short term, AI should be dedicated to helping humanity buy time and space to figure out what they actually want.
ChatGPT won’t be consuming the entire biosphere
As AI is currently being built to achieve particular goals, including prediction, Buterin asked, what if AI wasn’t goal-driven? Soares said it’s easy to build AIs that are safe and non-capable, and we may soon have AIs that are capable but are pursuing different things. He doesn’t think ChatGPT will consume the entire biosphere as it’s not at that level of capability.
Soares noted that most interesting AI applications, like automating scientific and technological development and research, seem to require a certain pursuit of goals.
“It’s no mistake that you can get GPT to write a neat haiku, but you can’t get it to write a novel. The limitations of the current systems are related to the fact that they aren’t pursuing these deeper goals, at least to me.”
Read more: