A pause on AI R&D is not enough, instead redirect it toward non-adversarial approaches

Preston Estep, Founder, Executive Director, Chief Scientist, Ranjan Ahuja, Director of Communications, Brian M. Delaney, Chief Philosopher, Alex Hoekstra, Co-founder, Director of Community

Boston Globe Op-Ed, May 15, 2023

The recent release of ChatGPT has brought long-simmering debates over the risks and benefits of artificial intelligence into public view. The ability of so-called large language model systems to answer complicated questions in conversational dialogue has helped non-experts comprehend the power of AI to accomplish a wide range of routine tasks, from drafting business letters and school essays, to writing computer code or even composing poetry, all within seconds.

But the growing capabilities of AI have triggered dire warnings from the science and tech communities.

The nonprofit Future of Life Institute recently posted an open letter titled “Pause Giant AI Experiments,” arguing “Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.” If a voluntary pause fails to be enacted quickly, the more than 27,000 signatories — including Elon Musk, Steve Wozniak, co-founder of Apple, and many other tech and science leaders — recommend that governments institute a moratorium.

In a Time magazine essay, AI researcher Eliezer Yudkowsky goes further, calling for an immediate and indefinite shutdown of large-scale AI, because, he says, “smarter-than-human intelligence” probably will eliminate all of humankind. Yudowsky concludes his essay by calling for the bombing of data centers that do not agree to the shutdown. (He allows possible exemptions, but only if the AI is not trained on text from the Internet.)

We are life scientists who run the Mind First Foundation, a nonprofit with a focus on “mindware,” the intersection of human intelligence and AI. Two of us are authors of a 2014 essay on mindware, which was included in a book of prizewinning essays addressing the question “How should humanity steer the future?,” (this book provided a founding catalyst for FLI). We agree with FLI and Yudkowsky that current large-scale AI research and development is headed in a very worrisome direction.

But we also believe that attempting to pause development is not the right answer. For one thing, even if all major players agree to such a pause (unlikely), their private conduct would be difficult to verify. The huge rewards of progressively better AI products will entice private companies to continue full-bore in the ongoing commercial AI races – to say nothing of the literal arms races of military projects. A pause by the well-intentioned will be equivalent to unilateral disarmament, allowing the noncompliant to make unchecked advances.

Additionally, a pause in AI research would mean a delay in the development of the myriad technologies that could substantially improve human life. We hear endless rationalizations for why humanity is losing battles against climate change, mental illness, homelessness, fatal drug overdoses, mass shootings, and other serious problems. AIs capable of assisting us in making substantially better decisions on such complex issues are not on the immediate horizon, but every moment we pause the advancement of AI is one in which these problems continue unabated. Some, such as climate change, grow increasingly intractable, potentially reaching irreversible tipping points.

So rather than just pause development, we propose a different approach: large-scale AI research and development should urgently be redirected away from the adversarial training that has been used to produce the most powerful AIs to date, and primarily toward collaborative and cooperative approaches.

What is “adversarial training?” It is a type of training in which a learning computer competes against an adversary (possibly itself) to achieve a goal such as to identify a complex shape or pattern, or to play and win a game. While these simple activities seem innocuous, and can be used to efficiently achieve simple, short-term goals, adversarial competition is the primary driver of Darwinian evolution. When it drives adaptive change in biology it results in fundamentally adversarial mindsets. The most powerful learning AI designs are modeled on the architecture of animal brains. When such an AI is challenged in an adversarial fashion with problems of real-world complexity, we think a likely outcome is that they will develop behaviors typical of animal minds, including aggression, territoriality, self-preservation, and so on. 

ChatGPT and similar projects are trained in a different but equally adversarial fashion, by designing them to emulate communications between Darwinian-selected role models: humans. Therefore, it is unsurprising that they display human-like aggression and biases. And it should be the reason for disallowing training AI on online communications, as proposed by Yudkowsky.

Instead of having AI emulate people’s combative online communications or undergo other adversarial training, consider a proposal from our 2014 essay: AI R&D should focus on those qualities that contribute to human sustainability, and that were not easily favored or achieved by natural selection (because of the overwhelming strength of adversarial selection). However, even cooperation has the potential to be adversarial and competitive, as in multi-agency warfare and gaming, in which one alliance wins at the expense of others. So an ideal AI is non-adversarial and cooperative, but also synergistic (mutually beneficial) with the best traits of humanity.

Is such an AI achievable by non-adversarial means? Might we even take an adversarial being and redirect it, making it a cooperative and synergistic — maybe even loyal and loving — companion? We — or rather our ancestors — have already done this, turning the wolf into the dog, humanity’s best friend. They achieved this gradually over many generations by selecting for the traits they desired, and against those they wanted to minimize.

Unlike dogs, self-improving AI might swiftly surpass our abilities, so we need to consider very carefully how to govern its relationship with us. One set of guidelines is articulated in the Asilomar AI Principles, endorsed by the FLI letter. Those principles seek to guide the development of AI to “align with human values,” and “accomplish human-chosen objectives.” Such statements sound reasonable, but unfortunately don’t prohibit even the worst human behaviors. Horrific practices, such as slavery, genocide, and warfare have occurred on massive scales throughout human history.

Because such atrocities often seem distant in geography or time, it is easy to absolve ourselves; but they were committed by ordinary people in different circumstances. This should make us reconsider whether an ideal AI should be unconditionally subservient to humanity or to conflicting ethics and values. An ideal AI would develop, as independently as possible, more sensible and sustainable ethics and values that benefit both humanity and AI. Is this realistic? 

We don’t know the capabilities of future AI, but even today’s advanced AI is capable of learning about our world and universe — and about us — from simple principles (tabula rasa), without human guidance or emulation.

It is critically important that humans and advanced AI achieve alignment of ethics, values, and goals. We think this should be accomplished partly by modifying ourselves and our conduct to synergize with a more thoughtful and less biased potential superintelligence (and having AI help us do that), rather than by attempting to enslave it. To accomplish this lofty goal, we can begin by directing AI R&D away from the purely adversarial, toward approaches that are more likely to result in a sustainable and synergistic relationship between humanity and AI.

Scroll to Top