On Valentines Day this year, I commented about the current AI boom and the paperclip maximizer, but I feel this topic deserves a bit more explanation.

With all of the hype and mobilization caused by the ChatGPT demo, the warning about AI alignment is missing from news coverage and conversation. We are getting a preview of the playbook for an even more advanced AI, and the plays are unsettling. These plays aren’t unexpected and are the same ones people concerned about alignment and safety have warned about. The same plays that AI maximalists downplay as people overblowing the dangers or not understanding how development works. Well, there’s quite a bit of vindication in the alignment and safety crowd. However, I’m not sure they are happy about it.

If a more capable AI system is developed and deployed this way, humanity is in trouble, possibly headed for the paperclip factory.

This post uses recent events to dismantle three fundamental tenants AI maximalists use to downplay dangers. We look at the conditions and activities surrounding the release of ChatGPT and consider the actions compared to a more capable AI system, even systems such as Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI).

Definitions

AI Alignment

AI Alignment is the area of research that aims to have the goals and activities of an AI system match the intended goals of the designer of the AI system. This seems like a fairly trivial problem, but as we’ll see with the paperclip maximizer, it’s not so simple.

Consider HAL 9000 from 2001: A Space Odyssey for an example of an alignment problem.

2001: A Space Odyssey HAL 9000 Scene

Also, for a great short story on a misaligned AI system, check out Philip K. Dick’s Autofac. Please read the story, and don’t watch the garbage Amazon remake they did for Electric Dreams.

AI Alignment and AI Safety are two different research areas, but for the sake of this blog post, I’ve lumped them together since this topic concerns both areas.

The paperclip maximizer

For those unfamiliar, the paperclip maximizer is a thought experiment that shows how a seemingly aligned goal applied to an advanced AI can have unintended consequences. In this case, AI is given the goal of maximizing paperclip production. In service of this goal, the AI converts everything in its environment, including humans, into materials for maximizing paperclip production.

The goal is to show that problems with an advanced AI could be far worse and have devastating impacts on humanity. That’s the frame of the concerns in this blog post.

Simple Formula

There is a simple formula that AI researchers use to discount the concerns of alignment and safety folks.

Responsible Development + Appropriate Guardrails = AI Utopia

Given the stakes, they argue that companies developing such technology will do so responsibly and with humanity’s best intentions in mind. Also, appropriate guardrails will be in place if something goes wrong, protecting humanity from harm. These points crumble in the current playbook with LLM-based chat technology like ChatGPT. We are getting neither responsible development nor appropriate guardrails.

Responsible Development

I wouldn’t classify what we’ve seen in the past few months as responsible development. We’ve seen a technology with many known issues thrown in the production environments for which they are not suited in the hopes that the issues can be corrected in use. It’s the old analogy of building an airplane while you are flying it.

This makes sense for these companies since incentives are aligned with making money, not making humanity a better place. ChatGPT made cutting-edge language models available to regular users but wasn’t any leap forward in innovation compared to competitors, it was just a great demo. Some would say a publicity stunt.

Some instructive takeaways from the ChatGPT release are informative to the release of a more advanced system. There is a rush to develop, release, and compete. This isn’t a revelation but a preview of a playbook we’ll see with a more capable technology, where responsible development is needed even more. Many tech companies now see themselves in an AI arms race, which won’t lead to more responsible decisions regarding releasing new technology.

So, what did we learn from the release of ChatGPT that would be instructive applied to AGI?

A company with a demo having known issues can build a lot of hype by exposing it. This hype forces adjacent companies, who previously acted more cautiously, to release their demo to be seen as “competing” and not be left behind. With caution now to the wind, there’s a rush to push it into production systems, even without the necessary safety protections in place, with the mindset that minimal protections are a good start and more advanced protections can be bolted on after the fact. Now, imagine that what these companies are building is HAL 9000’s

As we can imagine, a rush to public demo, even in the face of known issues may be the default starting point for future AI releases.

I’m all in favor of for-profit companies, but there needs to be some consideration for potential harm from irresponsible release. We wouldn’t let a biotech company engineer and release new viruses at will. The stakes are also much higher for a more capable AI, such as an AGI or ASI system. Today’s AI systems can and do certainly create harm, but the size and scale of the harm increase for a more capable system.

Guardrails

Guardrails are another fundamental tenant of keeping an advanced AI aligned with our goals. One of my favorites is people saying, “If it starts doing things we don’t like, we’ll just unplug it.” As if that would even be an option. “I’m sorry, Dave. I’m afraid I can’t do that.”

The guardrails around the current crop of LLM-based chat systems aren’t working out very well either. These systems begin life completely open, exposed to the words of humanity, and happily provide you with what you ask for. You know, things like how to hotwire a car, blackmail someone, and will even make stuff up for you. This behavior is not what the system designers intended, resulting in a patchwork of attempts at trapping these conditions.

The problem is that the space of all potential unwanted behavior is vast and unknown at deployment time resulting in issues that aren’t known until you see them

The problem is that the space of all potential unwanted behavior is vast and unknown at deployment time resulting in issues that aren’t known until you see them, meaning you can’t ideate traps until after the condition presents itself. There’s no reason to think it won’t be even worse for an AGI system with a far larger problem space, it will depend on techniques and methods that haven’t been developed yet, but the problem space will still be vast and challenging for creating guardrails.

Of course, this also makes a huge assumption that it’s even possible to develop appropriate guardrails for all of these conditions in the first place. We’ve just begun to scratch the surface because we assume accidental conditions. And then come the purposeful attacks.

It seems there is much in the way of misunderstanding in the attack surface of these systems. Although a system like a chatbot may seem pretty simple and have a single interface (text), you end up with something reasonably complex from an attack surface perspective. A system with a single interface but with limitless numbers of undocumented protocols, all waiting to be exploited. And, surprise, you find all of this out after you’ve deployed it into the world.

A system with a single interface but with limitless numbers of undocumented protocols, all waiting to be exploited.

Any advanced AI system that requires deployment into the world to develop guardrails will not end well for humanity. The bad news is it’s very likely that an AGI system would also start in this completely open condition, as we see in current transformer models, requiring appropriate guardrails to keep it aligned.

It was obvious to these companies that the guardrails around transformer-based chatbots were insufficient, but these companies moved forward anyway. This should be a wake-up call.

Job Displacement

Let me take a slight detour here to make another point on the current environment of generative AI. One of the other claims from AI maximalists is that people shouldn’t worry about losing their jobs because we’ll figure out how to share the wealth with everyone affected. This has always been a laughable proposition, but we are starting to see how laughable it is.

There is no reason to think that an organization with an AI capable of performing a job will have any consideration for how it affects the humans previously doing that job. I’ve written about AI taking both hobbies and jobs Current generative AI companies including Facebook are not sharing the wealth with freelance writers and artists they are displacing or about to displace, and there’s no reason to think this will change in the future. Job displacement must be addressed by other means, including possible government intervention, policy, and taxation.

Conclusion

In conclusion, I think we got lucky. We blew through a stop sign, and no cars were in the intersection. Many guardrail bypasses we’ve seen haven’t led to any realized harm. We wouldn’t be so lucky with a more advanced system, so it’s time to see this as the warning it was.

One response to “ChatGPT, Alignment, and the Paperclip Maximizer”

The Hot Mess of AI Alignment | Perilous Tech says:

April 5, 2023 at 10:21 am

[…] I’ve previously weighed in on the alignment topic and expressed my concerns about the current development trend. So if you are looking for definitions of alignment and the paperclip maximizer, you can see that here. […]

Loading…

ChatGPT, Alignment, and the Paperclip Maximizer

Definitions

AI Alignment

The paperclip maximizer

Simple Formula

Responsible Development

Guardrails

Job Displacement

Conclusion

Share this:

Like this:

One response to “ChatGPT, Alignment, and the Paperclip Maximizer”

Leave a ReplyCancel reply

Discover more from Perilous Tech