On a daily basis, I’m bombarded by fantastical news stories and email articles about ChatGPT and how 2023 is the year it’s going to replace “X” job or profession. It seems that no profession is safe. It reminds me of the Dewie The Bear scene from the movie Semi Pro where Will Ferrell screams, “Everybody panic” in a crowded coliseum.
ChatGPT has gone from being a piece of technology to a social contagion right in front of everyone’s eyes. Claims are being made that people can’t possibly believe, but they are making them anyway. I wrote last year about how ChatGPT creates amateur futurists. Well, it seems the floodgates are open.
This has created a group of overoptimistic zealots who’ve basically taken on the identity of crypto bros pumping ChatGPT to 10x. On the flip side, I’m certainly not delusional about the job loss myself. It’s true that generative tools will certainly have an impact on jobs. To take this further, I also expect some surprises and unexpected cases to crop up. The world is a complex place and it’s just not possible to take in all the variables. There are lots of people working in this area, and automation is going to have an impact on multiple professions, even if it doesn’t eliminate them. So, fair enough. But, the extent of the current mania is mind-boggling, with people tossing all manner of delusional predictions at the wall, hoping one sticks.
So, I wanted to add a bit of sanity, so anyone can make sense of this topic and at least try to frame these news stories in the appropriate bucket.
Evaluating News About ChatGPT and Job Loss
There’s a simple way to evaluate the merit of these job loss arguments, even if you aren’t familiar with the technology. Just ask a simple question. Is the cost of failure low? Boom, that’s it. If the cost of failure is low, then there’s a good chance there’s a near-future risk of impact from these tools. If the cost of failure is moderate or high, then there’s little chance of impact in the near term with the current crop of AI tools.
In my previous post, I wrote about how freelance artists will be impacted by the pervasiveness and accessibility of these tools. If you don’t like the art generated by the tool, just generate another one. On the flip side, think about the impact of having ChatGPT be your doctor or lawyer, especially given the fact that these models tend to hallucinate facts.
I know ChatGPT passed the Bar exam, but why is this surprising? I mean, I think most people would be able to pass the bar given an open book and unlimited time. Knowing the answers to questions is a far different matter than applying the knowledge you have to specific situations. I certainly wouldn’t want ChatGPT to argue my case in court, even though creating convincing BS seems to be one of its strong suits.
A Warning
I’ve stated before my biggest concern with all of the ChatGPT hysteria is that people in their mad scramble to compete will end up using this technology in areas where the cost of failure is not low, where there’s the potential for harm and even loss of life.
I worry quite a bit about mental health uses and the medical field in general. Mental health chatbots should be a bit red flag for us. There have been promising uses in using computer vision models to assist doctors in identifying whether tumors are malignant or benign, but that’s far different than having a knowledge system like ChatGPT. Even if these tools reach the status of “pretty good” it will lead to an automation bias where the doctor takes the recommendation of the system by default. This condition would, in effect, make something like ChatGPT, your doctor. Would this be better or worse than clicking through WebMD and diagnosing yourself?
I think people, in their optimism, tend to fill in the blanks, even in the case of very complex problems. In such cases, assuming we are only a couple of tweaks away from solving existing issues. It’s dangerous to bestow some mystical object status on technologies such as ChatGPT when what we need is a realistic analysis of the capabilities and limitations of such approaches. We tend to underestimate the complexities of even simple problems, which makes humans terrible at predictions, but our over-optimism could lead us down a very dangerous path in the next couple of years as these tools creep into critical decision paths.
This may seem like an odd book recommendation for 2023. After all, the book is 74 years old. Maybe you, like myself, read it when you were in school and felt that you’d gained all the insights from reading and classroom discussions. Do you remember any of those? I know I didn’t.
Revisiting a text like 1984 with the benefit of years and new context can lead to surprising insights. For example, did you notice the device called a Versificator? It’s a generative AI (of sorts) and its purpose was to crank out creative content, such as literature and music, without needing to expend creative thought. I’ll leave you to ponder the parallels with our modern boom in creative, generative AI (Dall-E, ChatGPT, etc.)
However, if you ask ChatGPT about its role in the story, it thinks it’s much bigger. Thanks to @CoryKennedy on Twitter for the image and the laughs.
What Made Me Revisit 1984 in 2022?
Believe it or not, it wasn’t misinformation, disinformation, or even surveillance discussions. It was something far less intelligent.
A while back, a person I was conversing with made some outlandish claims contrary to proven scientific facts. They insisted people shouldn’t be able to claim otherwise. Instead of directly challenging the person, I stated, “Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.”
The person gave me a puzzled look. Very proud of myself for remembering the quote, I smiled and said, “It’s from 1984.”
They responded, “I don’t care what year it’s from. That’s stupid.”
That exchange made me realize a few things. It’s been over 30 years since I’d read the book. I don’t remember the time when I’d read it. I was too young and cared too little. The quote I so proudly produced wasn’t from my reading but from others’ usage. I made a commitment to re-read it again in 2022.
Context
Put the reading in the context of the technological present. There’s a lot of referring to “the party” in the book, but just replace that with any other current group (tribes, in-groups, out-groups, conspiracists, etc.) The suspicion of other in-group members is like attacking your “near enemies.” For example, It’s easier for a group of conspiracy theorists to attack an in-group member who may agree that Bill Gates is microchipping people but not believe the earth is flat versus an out-group member who is rational and doesn’t care what conspiracy theorists think.
“The horrible thing about the Two Minutes Hate was not that one was obliged to act a part, but that it was impossible to avoid joining in.”
George Orwell – 1984
Does that quote remind you of something? Concepts like the Two Minutes Hate and Atrocity Pamphlets make sense in the context of modern algorithmic social networks optimizing for increased engagement.
The big conversation of the book always seems to be the surveillance and disinformation aspects. These concepts are certainly relevant today, but not from any one place. Orwell didn’t envision surveillance capitalism on top of other surveillance activities. Also, everyone is more than happy to share their exact location at will, which would have been terrifying to Orwell, but for all of us, seems to be the norm.
There are many other relevant aspects from the book applicable to current times. Denial of Science and reality, contradictory actions such as Doublethink, controlling language, and even re-writing or reframing history to fit changing narratives.
Orwell was on to the fact that people act differently when they know they are being observed. The same is true on social networks. People are more likely to share misinformation that aligns with their biases when they know others will see it.
I enjoyed my rediscovery. It made me think about its applicability in our algorithmically driven, tribal, and divided times, even though it was written in 1949. It also made me think of other texts I may have overlooked, such as Jules Verne’s Paris in the Twentieth Century. I normally don’t pre-plan my reading, but I may need to add consider reading this in 2023.
With that, I’ll leave you with a few of my favorite quotes from the book.
A Few of my Favorite 1984 Quotes
“The horrible thing about the Two Minutes Hate was not that one was obliged to act a part, but that it was impossible to avoid joining in.”
“In our world there will be no emotions except fear, rage, triumph, and self-abasement.”
“The Revolution will be complete when the language is perfect.”
“Don’t you see that the whole aim of Newspeak is to narrow the range of thought? In the end we shall make thoughtcrime literally impossible, because there will be no words in which to express it. Every concept that can ever be needed will be expressed by exactly one word, with its meaning rigidly defined and all its subsidiary meanings rubbed out and forgotten.”
“Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.”
“The children, on the other hand, were systematically turned against their parents and taught to spy on them and report their deviations. The family had become in effect an extension of the Thought Police.”
“In Newspeak there is no word for “Science.” The empirical method of thought, on which all the scientific achievements of the past were founded, is opposed to the most fundamental principles of Ingsoc.”
“A Party member is expected to have no private emotions and no respites from enthusiasm. He is supposed to live in a continuous frenzy of hatred of foreign enemies and internal traitors, triumph over victories, and self-abasement before the power and wisdom of the Party.”
“Who controls the past controls the future; who controls the present controls the past.”
“And if the facts say otherwise, then the facts must be altered. Thus history is continuously rewritten. This day-to-day falsification of the past, carried out by the Ministry of Truth, is as necessary to the stability of the regime as the work of repression and espionage carried out by the Ministry of Love.”
“Crimestop, in short, means protective stupidity.”
“Doublethink means the power of holding two contradictory beliefs in one’s mind simultaneously, and accepting both of them.”
“One does not establish a dictatorship in order to safeguard a revolution; one makes the revolution in order to establish the dictatorship.”
Scour the web and social media for reactions to OpenAI’s ChatGPT and you’ll notice a trend. Everyone’s now a futurist. The tech crystal ball has revealed to droves of people, regardless of background and experience a future where you can ask for technical content in the style of Shakespeare or even write college essays for you. People are crawling out of the woodwork, making wild predictions based on nothing more than the fact that of their surprise at the output. It conjures images of Tom Smykowski’s Jump To Conclusions Mat.
So, why does everyone believe they can now peer into the future after interacting with ChatGPT?
State of the Art
Let me say that the team at OpenAI did some fantastic work here. ChatGPT is legitimately cool research, and OpenAI is pushing the state-of-the-art forward and doing it in a public forum where people can evaluate the results. I can’t wait to see what the next iteration looks like. None of my observations is a reflection on the work they’ve done.
Manufacturing Futurists
There are two primary reasons ChatGPT turns people into futurists. The first is their surprise at the output, and the second is the accessibility of the demo. The first is fueled by the second.
Most people playing with the demo have never interacted with a Large Language Model (LLM) before. This lack of interaction means they don’t have a baseline for comparison of progress. A majority of the responses of the system will therefore be surprising. Surprise, in some cases, causes people to ignore the genuine failures in others.
The genius (or intelligence) in ChatGPT lies in the accessibility of its demo. Everyone can access and can play with the demo with no knowledge of programming, and it is all delivered on a simple web page. No complex decisions to make and no parameters to tune, just a blank input and your imagination. You have to love the simplicity.
I’ve seen wild claims that ChatGPT may be AGI or at least is Proto AGI. This is nonsense and ignores the system’s genuine failures when generalizing to the real world. I’m not an AGI researcher, but I can tell you we won’t get to AGI by chaining a series of LLMs together. Although, this should be a warning because if this is what AGI looks like, humanity is pretty screwed.
GPT-3.5 Isn’t A Single Model
The first thing to remember is that GPT-3.5, which is behind ChatGPT, isn’t a single model but a series of models.
Models referred to as “GPT 3.5”
GPT-3.5 series is a series of models that was trained on a blend of text and code from before Q4 2021. The following models are in the GPT-3.5 series:
code-davinci-002 is a base model, so good for pure code-completion tasks
text-davinci-002 is an InstructGPT model based on code-davinci-002
text-davinci-003 is an improvement on text-davinci-002
This explains why ChatGPT can be both good at Shakespeare and Python programming.
Over Optimism
Over-optimism can lead to early adoption and implementation inside products, which can lead to devastating consequences. What people forget is that this is just a demo. Many of the surprises people had at the capabilities of ChatGPT are because of their questions. Even though the questions may seem complex, they are often in the confines of a narrow problem definition. Humans aren’t good at randomness or complexity. We often oversimplify scenarios and don’t account for real-world complexities.
“Give me a recipe for tomato soup in the style of Shakespeare.” We become in awe of the prose and ignore the quality of the recipe, which in this case, is technically what you were asking the system for.
Surprise leads us to gloss over the many, many failures that ChatGPT has. It even fails on simple tasks that revolve around keeping count of items, like this example from Elias Ruokanen.
Even in the domain of information security, I saw people raving about ChatGPTs capabilities in identifying vulnerabilities in code. Someone commented that it could be used to collect bug bounties in the cryptocurrency space, but it failed. In my own experiments, I often found that it misclassified issues making guesses, such as since a parameter was accepted, it was vulnerable to SQL Injection.
Speaking of code understanding, since ChatGPT could provide explanations of code, people thought they could use it to answer questions on Stack Overflow, with the predictable banning that followed.
In experimental systems like ChatGPT, Dall-E, etc., the cost of failure is next to zero. You don’t get that in the real world. In the real world, even simple tasks end up being far more complex than anticipated and failure is costly. Seemingly simple automation tasks even hide complexity. For example, MSN replaced journalists collating stories with an algorithm and published and even preferred fake stories about mermaids and Bigfoot.
Experts Downplaying Problems
The scary part here is that when it comes to cutting-edge research in this area, where a system can get things wrong and cause a large amount of harm in the real world, many experts don’t seem to think it’s a problem. They aren’t acknowledging the threats and issues.
Even incredibly smart people don’t seem to grasp the impact of the issues. This is concerning because some of the very same people are building this technology.
Below are some statements from Yann LeCun, Chief AI Scientist at Meta. The first shows a lack of understanding of how misinformation and disinformation spread on social platforms, and the second makes a stunning false equivalency.
As far as his comments on generative art, I wrote an entire article on this subject. This statement is a false equivalency.
Now, I have a lot of respect for Mr. LeCun, and he’s certainly not the only one spouting these opinions publicly. But I’m using his case as an example to make a point. Experts developing this technology should take feedback from professionals in adjacent disciplines to strengthen their systems, not pretend they are only a few tweaks away from utopia. What has been happening is people take criticism, feedback, and misuse as attacks, instead of the healthy criticism necessary to improve systems. This was especially true with Meta’s Galactica, where genuine dangers were played off as insignificant mischief.
When generalizing to the real world, many of these issues are deeper and more systemic than anticipated. We aren’t merely a couple of tweaks away from fixing these issues.
Issue Handling Strategies
Current issue-handling strategies for these types of systems meant to generalize to the real world are sub-optimal. They typically fall into two categories, trapping conditions and adding more models.
Manually trapping conditions for all the things you don’t want a system to do is not a realistic or sustainable way of handling issues, especially with something as generic as a language model. Yet, many feel this is the way to go and we can see it in the way the engineers handled issues with ChatGPT as the situation evolved. But, you leave the door open to the human imagination and manipulation. The possibilities seem endless and you can only write so many rules.
Complexity is both the enemy of security and safety. This is something engineers should keep in mind as they look to engineer protections for their systems. On that note, another strategy thrown around is to train additional models to detect specific issues from the model. So, now we end up with an army of models with the purpose of keeping other models and systems in check, creating an opaque landscape where the next surprise is just around the corner.
More research is being done in this area, but today the gaps are still large enough to drive a truck through.
What Can You Do?
Always keep in mind that technologies like ChatGPT are experimental technologies, and you shouldn’t throw them into production systems. Unfortunately, people are going to do it anyway. Below are a couple of things to keep in mind before considering an experimental technology in a real-world system.
What’s the cost of failure in your use case? If it’s higher than insignificant, don’t implement the technology because it will fail.
Analyze known failures and trigger conditions for those failures. Try to recreate them. If, at first, it seems they are fixed, try different methods to trigger the same result. You could bump up against some condition trapping, and bypassing it could be trivial.
Perform extensive testing, including a series of tests specifically to get the product to fail. Be realistic about the results.
In short, you have to know the impact of your system failing and the various ways to trigger failure conditions in your system. If the problem space is narrow enough, you may focus the technology and trap specific conditions. If the cost of failure is insignificant, then you have some breathing room to experiment, but your mileage will always vary.
Conclusion
With this post, I just wanted to make a couple of observations on ChatGPT. It really is legitimately cool research and it’s moving state-of-the-art forward. Genies don’t fit back into bottles, so it’s important that we take steps now to plan protections to mitigate harm. Experts developing this technology need to take concerns and feedback from professionals in adjacent disciplines to ensure existing harms are reduced and safer systems are created as a result. The next few years are going to be interesting for sure.