Category: AI

Code Roulette: Spin the Code, Take the Breach

September 2nd, 2025
Although AI has taken a hit in the past few weeks, the vibes are still strong and infecting every part of our lives. Vibe coding, vibe analytics, and even vibe thinking, because well, nothing says “old” like having thoughts grounded in reality. However, an interesting trend is emerging in software development, one that could have far-reaching implications for the future of software. This is a type of code roulette where developers don’t know what code will execute at runtime. Then again, what’s life without a little runtime suspense?

Development and Degraded Performance

The world runs on software, so any trend that degrades software quality or increases security issues has an outsized impact on the world around us. We’ve all witnessed this, whether it’s the video conferencing app that periodically crashes after an update or a UI refresh that makes an application more difficult to use.

Traditionally, developers write code by hand, copy code snippets, use frameworks, skeleton code, libraries, and many other methods to create software. Developers may even use generative AI tools to autocomplete code snippets or generate whole programs. This code is then packaged up and hosted for users. The code stays the same until updates or patches are applied.

But in this new paradigm, code and potentially logic are constantly changing inside the running application. This is because developers are outsourcing functional components of their applications to LLMs, a trend I predicted back in 2023 in The Brave New World of Degraded Performance. In the previous post, I covered the impacts of this trend, highlighting the degraded performance that results from swapping known, reliable methods for unknown, non-deterministic methods. This paradigm leads to the enshittification of applications and platforms.

In a simplified context, instead of developers writing out a complete function using code, they’d bundle up variables and ask an LLM to do it. For simplicity’s sake, imagine a function that determines whether a student passes or fails based on a few values.
```
def pass_fail(grade, project, class_time):
    if grade >= 70 and project == "completed" and class_time >= 50:
        return "Pass"
    else:
        return "Fail"
```
If a developer decided to outsource this functionality to an LLM inside their application, it may look something like this.
```
prompt_pass = """You are standing in for a teacher, determining whether a student passes or fails a class.
You will use several values to determine whether the student passes or fails:

The grade the student received: {grade}
Whether they completed the class project: {project}
The amount of class time the student attended (in minutes): {class_time}

The logic should follow these rules:
1. If the grade is above 70
2. If the project is completed
3. If the time in class is above 50

If these 3 conditions are met, the student passes. Otherwise, the student fails. 

Based on this criterion, return a single word: "Pass" or "Fail". It's important to only return a single
word. 
"""

prompt = prompt_pass.format(grade=grade, project=project, class_time=class_time)
response = client.models.generate_content(model="gemini-2.5-flash", contents=prompt)
print(response.text)
```
As you can see, one of these examples contains the logic for the function inside the application, and the other has the logic existing outside the application. The prompt is indeed visible inside the application, but the actual logic exists somewhere in the black box of LLM land.

The example using code has greater visibility, and it’s far more auditable since the logic can be examined, which makes it far easier to debug when issues arise, and of course, it’s explainable. The real problem lies in execution.

The written Python function approach gives you the same result based on the input data every single time, without fail. The natural language approach, not so much. In this non-deterministic approach, you are not guaranteed the same answer every time. Worse yet, when this approach is used for critical decisions and functionality, the application can take on squishy and malleable characteristics, meaning users can potentially manipulate them like Play-Doh.

At first glance, this example appears silly, as writing out the logic in natural language seems more burdensome than using the simple Python function. Not to mention, slower and more expensive. But looks can be deceiving. People are increasingly opting for the natural language approach, particularly those with only minimal Python knowledge. This natural language approach is also more familiar to people who are more accustomed to using interfaces like ChatGPT.

Execute and Pray

However, let’s take a look at another scenario. In this scenario, a developer wants to generate a scatter plot using the Plotly library. In this case, we have some data for the X and Y axes of a scatter plot and use Plotly Express, which is a high-level interface for Plotly (as a developer may when plotting something so simple).
```
import plotly.express as px

xdata = [1, 2, 3, 4, 5]
ydata = [1, 7, 9, 11, 13]

fig = px.scatter(x=xdata, y=ydata)
fig.show()
```
Here is the result in all its stunning glory.

This is a simplified example, but in this case, we can clearly see the code that generated the plot and be certain that this code will execute during the application’s runtime. There is control over the imports and other aspects of execution. It also makes it auditable and provable.

Now, what happens when a developer allows modification of their code at runtime? In the following example, instead of writing out the Plotly code to generate a scatter plot, the developer requests that code be generated from an LLM to create the graph, then executes the resulting code.
```
prompt_vis = """You are an amazing super awesome Python developer that excels at creating data visualizations using Plotly. Your task is to create a scatter plot using the following data:

Data for the x axis: {xdata}
Data for the y axis: {ydata}

Please write the Python code to generate this plot. Only return Python code and no explanations or 
comments.
"""

prompt = prompt_vis.format(xdata=xdata, ydata=ydata)
response = client.models.generate_content(model="gemini-2.5-flash", contents=prompt)

exec(clean_response(response.text))
```
As you can see from the Plotly code in this example… Of course, you can’t see it because the code doesn’t exist until the function is called at runtime. If you are curious, the first run of this generated the following code after cleaning the response and making it appropriate for execution.
```
import plotly.graph_objects as go

x_data = [1, 2, 3, 4, 5]
y_data = [1, 7, 9, 11, 13]

fig = go.Figure(data=[go.Scatter(x=x_data, y=y_data, mode='markers')])

fig.show()
```
The AI-generated code creates the same graph as the written-out code in the previous example, despite being different. You may be wondering what the big deal is since the result is the same. The concern stems from several reasons, but primarily, allowing an LLM to generate code at runtime is not robust and leads to unexpected outcomes. These outcomes may include the generation of non-functional code, incorrect code, and even vulnerable code, among others.

For a simple example, as the one shown in this post, the chances of getting the same or incredibly similar code returned from the LLM are high, but not guaranteed. For more complex examples, such as those developers may want to use this approach for, the odds increase that the generated code will change more frequently.

Additionally, I implemented a quick cleaning function called clean_response to remove non-Python elements, such as text and triple backticks, from the response. The LLM can introduce additional unexpected characters that end up breaking my cleaning function and making my application fail. The list goes on and on, but a larger danger lurks in the background.

Whose Code Is It Anyway?

If you are versed in security and familiar with Python, you may have noticed something in the LLM example: The use of the Python exec() function. The exec () and eval() functions in Python are fun because they directly execute their input. Fun as in, dangerous. For example, if an attacker can inject input into the application, they can affect what code gets executed, leading to a condition known as Remote Code Execution (RCE).

An RCE is a type of arbitrary code execution in which an attacker can execute their own commands remotely, completely compromising the system running the vulnerable application. They can use this access to steal secrets, spread malware, pivot to other systems, or potentially backdoor the system running the application. Keep in mind, this system may be a company’s server, cloud infrastructure, or it may be your own system.

Anyone following security issues in AI development is aware that RCEs are flying off the shelves at alarming rates. A condition that was previously considered a rarity is becoming common. We even commented during our Black Hat USA presentation that it was strange to see people praising CISA for promoting memory safe languages to avoid things like remote code execution, while at the same time praising organizations essentially building RCE-as-a-Service. Some of this is mind-boggling, since in many cases, outsourcing these functions isn’t a better approach. In the previous example, writing out the Plotly code instead of generating it at runtime is relatively easy, more efficient, and far more robust.

Up until AI came along, the use of Python exec() was considered poor coding practice and dangerous. Now, developers shrug, stating that’s how applications work. As a matter of fact, agent platforms like HuggingFace’s smolagents use code execution by default. This is a wakeup. So, we dynamically generate code, provide deep access, and the ability to call tools, all with a lack of visibility. What could possibly go wrong???

Not only have developers chosen paradigms to generate and execute code at runtime, but worse yet, they’ve begun to perform this execution in agents with user (aka attacker) input, executing this input blindly in the application. In our presentation titled Hack To The Future: Owning AI-Powered Tools With Old School Vulns at Black Hat USA this year, we refer to this trend as Blind Execution of Input, which is the purposeful execution of input without any protection against negative consequences. This condition certainly leads to RCE and other unintended consequences, providing attackers with a significantly larger attack surface to exploit.

An application that takes user input and combines it with LLM functionality is a recipe for a bad time from a security perspective. Another common theme in our presentation, as well as that of other presenters on stage at Black Hat, is that if an attacker can get their data into your generative AI-based system, you can’t trust the output.

Things Will Get Worse

Using the outsourced approach when a more predictable deterministic approach is a better fit will continue to degrade software from a reliability and security perspective and have an impact on the future of software development.

Vulnerabilities in AI software have made exploitation as easy as it was in the 1990s. This was the “old school” hint in the title of our talk. This isn’t a good thing, because the 90s were a sort of free-for-all. Not only that, but in the 90s, we often had to live with vulnerabilities in systems and applications. For example, in one of the first vulnerabilities I discovered against menuset on Windows 3.1, it was impossible to fix. There were no mitigations, and most people were unaware of its existence.

As the outsourcing of logic to LLMs accelerates, things will worsen not only due to incorrect output and hallucinations but also from a security perspective. Anyone paying attention to the constant parade of vulnerabilities in AI-powered software can see this trend with their own eyes. These vulnerabilities are often found in large, mature organizations with dedicated security processes and teams in place to support them. Now, consider startups and organizations that implement their own experiments using non-deterministic software, often with a lack of understanding of how these systems can be manipulated. It’s become a game of speed above everything else.

As I’ve said from the beginning of the generative AI craze, the only way to address these issues is architecturally. Most of AI security is just application and product security, and organizations without these programs in place are in trouble. If proper architecture, design, isolation, secrets management, security testing, threat modeling, and a host of other activities weren’t considered table stakes before, they certainly are now. And possibly not surprisingly enough, they still aren’t being done. Anyone working for a security organization sees this every day.

In essence, developers need to design their applications to be robust to failures and attacks. It helps to consider designing them as though an attacker can manipulate and compromise them, working outward from this premise. As the adage goes, an attacker only needs to be successful once; a defender needs to be successful every time. This makes something that sounds great in theory, like being 90% effective, sound less impressive in practice.

Keep in mind that performing a code review won’t provide the same visibility as it has traditionally. This should be obvious since the code that would be audited doesn’t exist until runtime. You’ll have to pay more attention to validation routines and processing of outputs, putting huge question marks over the black box in the middle. And, of course, ensuring the application is properly isolated.

Some may suggest instrumenting the applications with functionality to perform runtime analysis on the generated code. Sure, it’s possible, but the performance hit would be significant, and even this is, of course, far from a silver bullet. You might not even get the value you think you are getting from this instrumentation. Also, you’d have to know ahead of time the issues you are trying to prevent. That is, unless you plan to layer more LLMs on top of LLMs in a spray-and-pray configuration.

To keep this grounded, all AI risk is use case dependent. AI models don’t do anything until packaged into applications and used in use cases. There may be cases where reliability, performance, and even security are of lesser concern. Fair enough, but it’s a mistake to treat all applications as though they fall into this category, and it’s far too easy to overlook something important and view it as insignificant.

If you work at an organization that isn’t building these applications and think you’re safe, you might want to think again, because you are at the mercy of third-party applications and libraries. It would be best to start asking hard questions of your vendors about their security practices as they relate to applications you purchase. Especially applications that use generative AI to generate code and execute it at runtime.

Near the end of our presentation, we had some advice.

Whether outsourcing the logic of an application to LLMs or having the LLM dynamically generate code, assume these are squishy, manipulable systems that are going to do things you don’t want them to do. They are going to be talked into taking actions that you didn’t intend, and fail and hallucinate in ways you don’t expect. Starting from this premise gives a proper foundation for deploying controls to add some resilience to these systems. Of course, not taking these steps means your applications will contribute to the ongoing dumpster fire rodeo.
AI Grenades: When Errors And Hallucinations Become Facts

June 5th, 2025
We are continually inundated with examples of silly errors and hallucinations from generative AI. At this point, it’s no secret to anyone on the planet that these systems fail, sometimes at rather high rates. These systems also have a tendency to make stuff up, which isn’t a good look when that data is used for critical decisions. We’ve become numb to this new normal, creating a dangerous condition where we check out instead of recheck. But what happens when these errors and hallucinations become facts, facts that may be impossible to dispute or lurk in the background unseen and uncorrected?

Perspectives From Our Younger Selves

Imagine traveling back in time for a conversation with our younger selves about the current state of AI.
```
Younger: Wow, it must be great to live in a world without cancer or dementia.
Older: No, we haven’t cured cancer or dementia.
Younger: Well, at least people are super smart now.
Older: No, there are still many dumbasses.
Younger: At least you have systems that don’t make mistakes.
Older: No, they make mistakes all the time.
Younger: Then, what in the hell do you do with systems like this?
Older: Mostly memes and short videos of stupid shit. Oh, we even try to impress world leaders with what they’d look like as a baby with a mustache.
```
Although it may seem silly, this thought experiment is informative. It puts our current AI moment in perspective and should add some humility. These systems aren’t the magnificent, magical boxes capable of handling every task with equal proficiency in both work and life. They are tools that we can use for specific tasks, far from the perfected AI of science fiction, and this is where the issues creep in.

Icebergs, Grenades, and Damage

I’ve made the grenade analogy before relating to agents. It’s an apt analogy because it’s something that causes damage, but not immediately. It’s like the classic joke grenade, which is a prank you play on your friends with the expectation of future laughter. Only with AI, the result isn’t a barrel of laughs. It’s a barrel of something that stinks and should be spread over a field as fertilizer.

The mistake is that seeing so many instances of these issues gives us the false impression that these issues are being caught and possibly even corrected. Think of issues like hallucinations as an iceberg. There are far more instances beneath the surface that lie unseen, lying in wait to send our ship to the depths.

There’s also the problem that not all conditions of hallucinations are so easy to identify. The ones that seem to get identified are those that are blatantly obvious or require additional validation, such as checking the cases referenced in a legal document. This is why it seems that only lawyers and politicians are making fools of themselves with AI. The landscape is far broader than these two categories.

It’s also instructive to see how people respond when these issues are brought to light. In the recent MAHA report scandal, the White House spokesman referred to AI hallucinations as “formatting issues.” Yeah, right. Imagine walking into your bank and finding out you have no money in your account. Frantic, you ask the teller what’s going on, and they tell you that you have no money because of a formatting issue. We can’t let people downplay these problems because they are common. It’s because they are common that we need to be more concerned.

We can’t let people downplay these problems because they are common. It’s because they are common that we need to be more concerned.

Although some instances may seem silly, there are no doubt real consequences. Such as AI hallucinating into people’s medical records, because we all know that can’t end badly. Hypothetically, let’s imagine that the generative AI system utilized is 99% accurate, which is enormously far from reality. Performing 10,000 transactions/results/outputs a day could potentially yield 100 issues. Crank that up to 1,000,000 a day, and that’s 10,000. This is terrifying when considering the realistically high error rates that these systems actually exhibit. There’s no doubt a river of manure flowing into data stores. The pin has been pulled.

The nature and pattern of errors differ significantly between AI and humans.

I can already feel the AI crowd’s eyes rolling, opening their mouths to issue the overused retort, “But humans make mistakes too.” Yes, they do, but human mistakes and AI mistakes aren’t the same. The nature and pattern of errors differ significantly between AI and humans. Human error tends to be more predictable, with errors and mistakes clustering around areas such as low expertise, fatigue, high stress, distraction, and task complexity. In contrast, AI errors can occur randomly across all problem spaces regardless of complexity. This is why AI systems continue to make boneheaded errors on seemingly simple problems.

A nurse may indeed make a mistake in an annotation in a patient’s medical record, such as a misspelling, incorrect date, or time. More severe incidents, such as mixing up patients or medications, can also occur, but are much rarer. Nurses aren’t going to fabricate a whole event that didn’t happen as a mistake.

With the widespread use of AI, there are bound to be significant impacts. They won’t all cause major harm, but they will all tell an inaccurate story. Severity will depend on the system consuming this data and its intended use. Some will be purely annoying, but others will have serious consequences. A person with hallucinated data in their medical record may be prescribed the wrong medication or a medication to which they are allergic. I’m speaking in vagaries here because the extent of the problem isn’t fully understood, but one thing is certain: it’s getting worse as the usage of generative AI expands.

Another problem will be tracing these issues back to their source. It won’t always be obvious when a mistake originates from an AI system or a human. After all, these systems are meant to augment human processes. When it comes to blame, humans will always blame AI, while system owners will always blame the humans. It’s a mess.

The New Truth

Ultimately, we’ll uncover a disturbing reality. In many cases, hallucinated data becomes the truth. After all, it’s the “fact” that’s in the data store. Imagine trying to dispute this with someone at the DMV, customer service, our bank, and the list goes on and on. We become yet another in the long line of those contesting the “facts” on hand, directed to a Kafkaesque nightmare as we have to navigate some bureaucratic maze attempting to get a resolution.

A more cementing factor would be if the data is incorrect and there is no human to consult, only an AI making decisions based on the data it has. It offers apologies, not resolutions. And these are only instances that we become aware of.

Many stealthy decisions occur in the background, made by invisible systems that utilize these new “facts” to make determinations that impact our lives, our families, and our health. We may never fully understand the impact this new truth has on us, our families, or our future.

All of this damage stems from the systems we are using right now, today. Even if better, more accurate systems emerge, the damage being done today still stands. These new, more advanced AI systems may be trained or fine-tuned on hallucinated data generated by current AI systems. So, we’ve got that to look forward to.

These new, more advanced AI systems may be trained or fine-tuned on hallucinated data generated by current AI systems.

The Cause

Some of these issues can be attributed to automation bias, but it’s far from the whole explanation. There is a push from the top to utilize AI everywhere possible. Many companies are asking employees to do more with less. Well, when you have less time, one of the things you spend less time doing is worrying about quality or accuracy.

We’ve also been inundated with CEOs and other business leaders proclaiming their intent to replace everyone with AI. There isn’t much motivation to do a good job in environments like this. We’ve seen this happen in the past with jobs getting outsourced.

The reality is that these are self-inflicted wounds caused by the rapid adoption of error-prone technologies being thrown into use cases where the negative impacts aren’t considered.

What We Can Do

If companies and individuals intend to augment their activities to optimize and increase efficiency, they need to ensure that this optimization doesn’t cause harm. There needs to be processes in place to identify and address these issues before they cause a problem. This isn’t happening today.

Unfortunately, there isn’t much we, as future victims, can do, especially since we don’t know the extent of the problem. It’s impossible to be aware of all the people using these systems today and how they may affect us in the future. From government to private business, these tools are utilized for a wide range of tasks, both mundane and critical.

I’m not a fan of big government or excessive regulation, but it’s hard to see how these issues can be solved any other way, since we only become aware of the harm after it has happened. Consumer protection is something a government is far better equipped to handle than a handful of consumers. The tech crowd’s claims that burdensome regulations inhibit innovation are absolutely true, and this shouldn’t be the goal. However, the absence of existing regulations harms people, as consumers are powerless to take any action in their defense. Unfortunately, reasonable, level-headed regulations are not in our future.

At the very least, we should avoid AI in high-risk or safety-critical use cases. The thought of ChatGPT running something like air traffic control is terrifying. However, handing out this advice at this point seems like trying to reason with a hurricane. Admittedly, for users, it may not be immediately apparent that the tasks they are performing or the data they are collecting can ultimately lead to one of these scenarios.

The Problem At Our Feet

AI hallucinations and other inaccuracies are like grenades with the pin pulled, only instead of chucking them far away from ourselves, we’ve dropped them at our feet, staring at them, wondering what happens next. The only question is, how long will it take for us to find out?
Our Next Disaster: Negative Impacts Of Mandating AI Education in K-12

May 16th, 2025
What’s the effect of exposing children to AI at a very young age? Well, we are about to find out. President Trump signed an executive order called Advancing Artificial Intelligence Education For American Youth, and, in the face of the other executive orders pushed by the administration, it may be tempting to consider this order relatively benign. I urge people to reconsider, because this order could result in catastrophic and irreparable damage to future generations of children. Move fast and break things is all well and good until the thing being broken is your child.

This move represents many of my fears coming to fruition, with all of the negative aspects I’ve been warning about becoming cemented into the foundation of future generations. You may have heard me talk about conditions such as cognitive atrophy, but early exposure to AI in education can lead to something far worse: cognitive non-development.

There are also technical concerns, including issues with security, privacy, alignment, and reliability. Children are rich sources of data wrapped up in easily manipulable packages, so it’s no surprise that tech companies are opening their AI tools to them. However, I feel these concerns are more evident to most people than the negative cognitive impacts that the introduction of AI to young children creates, especially while their brains are still developing and maturing. These are the issues I highlight here.

Key Points

Since this is a long article, I’ll call out a couple of key points:
- Cognitive offloading by children and adolescents to AI short-circuits cognitive development impacting executive functions, logical thinking, and symbolic thought
- We convert social to anti-social activities
- The very skills kids need to use AI effectively never develop due to the overuse of AI
- Core foundations of critical thinking, data literacy, and probability and statistics need to be introduced before any AI curriculum
- Worldviews will be shaped by interactions with AI systems instead of knowledge, experience, and exploration
- Kids need time to explore the generative intelligence inside their skulls
What Are The Hopes?

Before we begin, it’s helpful to take a step back and consider what the product of this education is supposed to look like. We envision emotionally balanced young adults exercising hardened critical thinking skills and ingenuity to create the next wave of high-tech gadgets. This is the stereotypical AI bro vision of an AI tide lifting all boats, but the reality strays far from the vibes.

There’s nothing fundamentally wrong with this perspective except that exposing children to AI tools beginning in kindergarten almost guarantees the opposite. This is for two primary reasons: the negative cognitive impacts on early childhood and adolescent development, and poor curriculum implementation.

Now, can this program succeed in a way that benefits children and empowers them for the future? Absolutely, but it would be nothing more than success by miracle. A program like this needs to be well thought out and studied, with a gradual implementation that also considers potential tradeoffs and implements mitigations for these negative effects. This is NOT what we are getting here. This fails 999 times out of 1000, possibly more. Just read the wording of the executive order and imagine people rushing to implement it, along with the bros swarming like flies around a manure pile, anxious to pitch their half-baked products.

The introduction of AI and AI tools so early in childhood education will be yet another big mistake that everyone realizes in hindsight. To set the stage, many fail to realize just how much EdTech has been a failure, and now, without addressing any of the issues, we want to add even more screens in the classroom.

I don’t think everyone involved is a bad actor with perverse incentives. I think most people genuinely want to see children succeed and flourish. However, there is no consideration here for the long-term cognitive impacts on children.

AI In Education

While I was writing this article about AI in K-12, two other articles were released about AI in higher education. The article from New York Magazine about students using ChatGPT to cheat, and the story in Time of a teacher who quit teaching after nearly 20 years because of ChatGPT. The cheating article is creating a flurry of hot takes on social media. We’ve reached a technological tipping point where students don’t see the value in education. They want accomplishment and bragging rights (degrees) without effort. Apparently, attending an Ivy League school is no longer about the education you receive but the vibes you create and consume.

And of course, queue the defensive hot takes.

This is a common retort. The mistake of assuming low-quality Q&A for actual curiosity and insight. This information was available to us all along. It just required more friction to get. So, if this is the case, then the answers we wanted weren’t worth the effort. This is hardly an earth-shattering insight, yet we’re being pitched as though it is. Keep in mind, just because these people aren’t selling a product doesn’t mean they aren’t selling something.

As usual, Colin Fraser is on point.

A problem we’ve always faced is that we never know when we are learning something in the moment that will be valuable later. We exercise a stunning lack of current awareness for future value. This happens in all manner of experiences, but especially in education. Adults lack this awareness, and it’s completely delusional to expect that K-12 students will magically sprout this awareness.

We exercise a stunning lack of current awareness for future value.

There is value in learning things, even things you don’t use for your job. We seem to think learning is contained in individualized components that fit neatly into buckets, but there are no firewalls around these activities. Learning things in one subject is rewarding and beneficial, even to other subjects. Colin is also right about driving the cost of cheating to zero, a major point everyone seems to gloss over.

In his book, Seeing What Others Don’t, Gary Klein tells the story of Martin Chalfie walking into a casual lunchtime seminar at Columbia to hear a lecture outside his field of research. An hour later, he walked out with what turned out to be a million-dollar idea for a natural flashlight that would let him peer inside living organisms to watch their biological processes in action. In 2008, he received a Nobel Prize in Chemistry for his work. This insight doesn’t come from staying in your lane, being single-minded, or asking the right questions to an LLM. Yet, this is exactly the message thrust upon us. AI doesn’t provide the happy accidents that result from exploration and the randomness of life.

Using AI instead of our brains gives us the illusion of being more knowledgeable without actually being more knowledgeable. We shouldn’t underestimate the power of this illusion because it blinds us to certain realities. AI offers an illusion that completing tasks and knowledge acquisition are the same thing, but knowledgeable and productive are completely different attributes. This positive feeling of being more productive masks that we aren’t acquiring knowledge. Numbers end up overshadowing quality, and productivity vibes end up trumping learning.

Some may argue that productive is preferable to knowledgeable in a business context, but that hardly applies in education. The ultimate goal in formal education is to learn, not produce, with the PhD being the exception. Education shouldn’t be about creating useful automatons, despite how many business leaders may want them.

AI In K-12

Introduction in K-12 means that these tools are introduced during critical brain development and could short-circuit the development and maturation of things such as executive functions, logical thinking, and symbolic thought as students offload problems to AI systems. Instead of having skills atrophy through the overuse of AI, these skills never develop in the first place due to cognitive offloading to AI tools. No matter what the AI bro impulses, we should all agree that exposing kindergarteners to AI is an incredibly bad idea.

Instead of having skills atrophy through the overuse of AI, these skills never develop in the first place due to cognitive offloading to AI tools.

All of the issues and negative impacts I’ve been pointing out, such as the cognitive illusions created by the personas of personal AI, along with associated impacts such as dependence, dehumanization, devaluation, and disconnection, get far worse when exposed early in childhood and adolescent development because children never discover any other way. Blasting children with AI technology in their most formative years of brain development pretty much guarantees lifelong dependence on the technology. Something that elicits drooling at AI companies, but is hardly in the best interest of human users. What we consider overreliance today will be normal daily use for them. Worldviews will be shaped not by knowledge and experience, but by interactions with AI systems.

There’s something fairly dystopian about prioritizing AI literacy while actual literacy is on the decline , disarming future students from the very skills they’d need to keep AI in check. The impression seems to be that if you can teach kids AI, you can negate negative downturns in literacy. After all, why should something like reading comprehension matter if tools provide the comprehension for us through a mediation layer? Hell, why stop there? Why not apply AI to every task that could possibly be outsourced? We are close to creating a world where raw data and experiences never hit us.

The Future Isn’t Now

In their book AI 2041: Ten Visions for Our Future, Kai-Fu Lee and Chen Qiufan have a story about children who grow up and go through school with companion chatbots to assist them in life. These chatbots adapt to them and assist them in areas where they have challenges. AI systems are ever-present companions following them through school and in life. The story is meant to have the trappings of utopia, but ends up sounding like a dystopian hellscape. To make matters worse, their story considers a perfected AI system that doesn’t have all the issues and drawbacks of today’s AI systems.

We continue to make the mistake of treating the AI systems of today as though they are the AI systems of tomorrow. Encouraged into hyperstition and thought exercises of, “It doesn’t work, but just imagine if it did!” To say that AI will cure cancer and become the cure for all of humanity’s ails may likely turn out to be true, at some point. But these accomplishments have yet to come to fruition, and don’t appear on the horizon either. So, why are we treating these systems as if they’ve already accomplished goals they haven’t? The highly capable tutor/companions of Lee and Qiufan don’t exist, yet we want to apply this non-existent vision to K-12 education as though they do. Even if they did exist, where is all this highly personalized data about your child being stored, and what is being done with it?

Less Capable, More Dependent, and Less Stable

The crux of the issue is that this program will not set kids up for success in an AI world or otherwise. This early exposure will make them less capable, more dependent, and less stable. This curriculum could teach kids all the wrong things, such as that answers can be immediate and simple, and that working out a problem isn’t as important as asking the right questions. We also teach that learning is comfortable. We give the impression that knowing things is not as important as knowing where things are stored. This is all bullshit. Kids can’t summarize their way to knowledge. But, it gets worse.

Children exposed this early never learn how to do things for themselves. They end up outsourcing problems and decisions to AI. Instead of taking feedback on how to solve problems, challenging themselves to learn, they offload the problem to AI, making them incapable and lacking confidence in the absence of technology.

This technology dependence also creeps into their personal lives, meaning going about their typical day becomes unbearable without the ability to mediate through AI. It becomes a source of authority for them and a way to avoid difficult decisions that teach them lessons. It can be hard for us to imagine today the future paralysis created when the technology is absent, even for simple decisions like how to respond to a friend’s message or whether to go outside today.

Many adults may argue that this is a small price to pay for setting kids up for success in the future. There are two flaws here. First of all, this is a monumental price. Second, using technology more doesn’t automatically mean being better at using it. For AI use, the skills you learn outside of AI’s mediation are exactly the skills that make you better at using it.

We need to focus on teaching kids to use their brains, something I never thought I’d have to say when talking about… school.

This is typically when someone brings up the calculator, insinuating that nobody needs to learn math because it exists. Although I disagree, confusing a calculator with AI technology is a mental mistake. Calculators and AI are far from being similar technologies. A calculator isn’t a generalized technology that can be applied to many problem spaces. A calculator doesn’t provide recommendations, advice, or sycophantic outputs. It won’t tell you who to date or be friends with. Oh, and a calculator is always right, unlike AI.

The hypothetical response that gets pitched around is imagining if Einstein or Von Neumann had access to AI and all of the wonderful things that would have sprouted from their genius. Maybe, however, I pose a different experiment. Imagine if Einstein or Von Neumann were a product of AI education from a very early age, where even inane curiosities were immediately satiated by an oracle. The likely output is that nobody would know their names today. We are products of our environments. Remember, there are no happy accidents with AI, only dense data distributions in which everything is shoved. In the K-12 AI education era, Einstein never stares back at the clock tower on the train, because he’s looking down at his phone.

In the K-12 AI education era, Einstein never stares back at the clock tower on the train, because he’s looking down at his phone.

Avoiding Discomfort

Sam Williams from the University of Iowa said, “Now, whenever they encounter a little bit of difficulty, instead of fighting their way through that and growing from it, they retreat to something that makes it a lot easier for them.” We are looking to apply this in K-12, specifically when we want students to grow.

The truth is, knowledge acquisition isn’t comfortable, and students avoid discomfort like the plague. When we use AI to complete assignments, we aren’t challenging ourselves. We aren’t developing our own perspective and forming new connections between concepts. Students find writing uncomfortable and are quick to outsource to AI, but writing truly is thinking. When we write, we are confronted with our thoughts and perspectives, challenging ourselves and forming new insights. One realization with writing is that the more you do it, the better you get. This realization never comes when it’s constantly outsourced to technology.

Using AI for work-related tasks may be helpful, but using AI for education or even life is idiotic. Yet, we continue to make these foundational mental mistakes. This would be like saying that since Taylorism worked for business, why not apply it to daily life? We all know where that leads.

But we also end up robbing students of a sense of accomplishment and fulfillment, of a long-lasting sense of satisfaction, not to mention the ability to focus. And for what? Because we believe that children will need to be non-thinking automatons to have a chance in the future? This theft will have a lasting impact on the mental health of future generations.

We may experience the extinction of the flow state by never allowing people to enter it in the first place. I’ve heard people argue that they’ve entered a flow state using AI, maybe, but likely the very nature of using AI to complete tasks guarantees that you never enter a flow state. Either people are confused about what a flow state is, or they mistake the illusion of productivity for creativity and flow.

As Ted Chiang mentioned in an article I’ve referenced before, ⁠”Using ChatGPT to complete assignments is like bringing a forklift into the weight room; you will never improve your cognitive fitness that way.”

Going to the gym isn’t comfortable, but the results are physically and mentally rewarding. The mental health benefits of going to the gym aren’t intuitive. After all, how can running on a treadmill or lifting weights, activities that work out your muscles, benefit your mental state? Yet, it does. There are no firewalls around exercise either. Knowing this doesn’t stop us from making the same mistakes in cognitive areas.

When Playing It Safe Becomes The Norm

Using AI to do things is perceived as safe because if the output is wrong, we can blame the AI, versus having to work out a problem ourselves and potentially being wrong. There’s a blame layer between us and the problem.

Let’s take art, for instance. AI art is safe, unchallenging, and unfulfilling, providing no opportunity to learn about ourselves, others, or the world. And yet, the very fact that it’s safe and easy is what makes it so attractive. Failure can result from the paintbrush, but never the prompt.

Failure can result from the paintbrush, but never the prompt.

The best things in life come from not playing it safe. Taking a chance on a job, moving to a new location, or asking a person out on a date are all activities that aren’t safe, but they can end up being the best decisions we’ve ever made. We need to keep this instinct alive in children.

Lack of Resiliency

The more we rely on AI, the less we question its outputs. The more we use AI and our capabilities atrophy, the less capable we become of questioning the outputs and, hence, the more dependent we become. We end up losing a critical capability when we need it the most, or in the case of early childhood exposure, never develop it in the first place.

Modern generative AI is far from error-free. It makes frequent mistakes and hallucinates. Students must construct the cognitive fitness necessary to operate robustly using a technology that makes these frequent mistakes. This fitness isn’t built on a foundation of the same AI that has these issues.

Students also need a foundation and the ability to explore outside AI mediation. This requires both time and foundational courses and concepts. For example, this foundation should include critical thinking, data literacy, and probability and statistics. Early exposure to these concepts with late exposure to AI offers the best chances for students to build this robustness.

From Social to Anti-Social

AI is a fundamentally anti-social technology. From the ground up, we are removing the human and converting it to the non-human. Even social networks are transforming into anti-social networks. With AI’s overuse in children we teach kids that humans are second-class citizens to AI. After all, the sales pitch is that AIs are better at everything, so why should children believe otherwise?

Handing kids an oracle to ask questions not only converts a social activity into an anti-social activity but also shifts authority away from humans and onto technology. This shift would still be bad even if the technology were perfected, but it is far worse given the error-prone technology of today.

Young children are quick to anthropomorphize and will form a bond with non-human companions. Although the video of the little girl not wanting to play with the shitty AI gadget is funny, it won’t last when children are surrounded by AI. Kids will switch from actively using their imagination to becoming passive consumers of AI output.

The human retreat has already begun, as kids prefer interactions with friends mediated by a device. But now tech companies want to take this further. This is all happening outside of education, but kids can’t avoid forced interactions with their companion/tutor/friend/bot in the classroom, reinforcing this retreat.

Much of this slide comes from our tendency to oversimplify, not accounting for the bigger picture and the complexities involved. Take, for instance, a common claim that kids ask many questions, and since AIs never tire of answering them, pairing kids with AI is a natural fit. This seems like an almost throwaway point, a gotcha to any potential critic, but people making this point haven’t thought it through.

First of all, asking questions is a social activity. We interact with other humans in different environments, learning far more than the simple answer to our questions. This activity teaches us essential skills, including ones related to non-verbal communication. Humans also don’t answer questions the same way AIs do, often providing additional context and anecdotes that may further aid us in knowledge acquisition and retention.

This act connects us to other people and the world, making us active participants in something bigger rather than passively consuming an answer. I still remember anecdotes shared from my high school chemistry teacher that stick with me today. We don’t just lose context and perspective from an AI oracle, we lose something human.

When it comes to context, any expert who has asked AI questions about their topic area has been confronted with incorrect information, including something like, “I guess that’s technically true, but it’s hardly the whole story.” And this is what we want to make the norm.

Closing The Curiosity Gap

We are told that asking an AI questions makes people more curious, but AI closes the curiosity gap. By getting an instant answer, we satiate our curiosity and move on to the next thing, only digging deeper or exploring further in cases of pure necessity. This act reinforces low attention spans, further reducing the ability to focus. At some point, System 2 may become extinct. What kind of world will that create, where the world is nothing but hot takes and vibes?

AI satisfies a need for quick answers. However, searching for answers in a more traditional way means other pieces of valuable context surround you. Other rich pieces of information that lead to new ideas and new understanding. Humans have an evolutionary need for exploration.

When using AI for exploration, you are never exposed to ideas and concepts you don’t want to be exposed to. I don’t think we fully grasp just how much of an impact this selection bias will have on the future.

Sure, there are situations where a quick answer is perfectly fine, mundane things like what time a movie starts or what temperature to set your oven to cook a pie. The mistake here is assuming these situations apply evenly to all problem spaces, especially knowledge creation.

My Recommendations

Despite the many unknowns, we shouldn’t shut the door to new innovations because we could slam the door to new solutions. Although it doesn’t exist today, a robust tutoring bot focused on a single purpose and specific subjects could benefit students. The message here isn’t to discard everything but to be cautious, knowing there are tradeoffs and downsides, and incorporate mitigations.

For a program such as this to be successful, it needs to be well thought out and studied, with a gradual implementation that also considers potential tradeoffs. Without this, you have no way of telling whether you are helping or harming until it’s too late. There is no way to succeed without this step. Beyond this up-front work, I’ll make four other suggestions.

Avoid Early Exposure

Students need plenty of time to develop their brains, not technology. Early exposure should be avoided at all costs. Exposure to this curriculum should happen in high school, preferably in the last two years, not earlier. This is typically when vocational education programs were introduced in schools as well. This gap gives students time to develop skills and experiences outside AI influence and mediation. Kids adapt to technology quickly, so this later exposure will not stunt their capabilities when tools are introduced.

Create A Prior Solid Foundation

Before introducing the AI curriculum, a solid foundation in various topics should be established. This foundation should include courses in critical thinking, data literacy, and probability and statistics. These courses and concepts have been sorely lacking in K-12 education today, and their introduction is long overdue. Arming students with this foundational knowledge will allow them to question the outputs of these systems and create defenses for cognitive creep.

Smart Implementation

The implementation of the courses should be isolated and away from other topics. AI shouldn’t be woven into every topic with a tie-in. Although some would argue that an effective AI tutor could help students struggling with certain subjects, these systems have yet to be developed, much less proven effective. In almost all cases, the AI would be used as an oracle, providing answers directly instead of the necessary understanding and even discomfort that helps students grow.

Solid Curriculum

The curriculum should focus on challenging students, not giving answers. Kids often don’t realize when challenges are beneficial to them. AI tools should continue to be viewed purely as tools, not oracles or companions. The curriculum should focus on avoiding usage as personas and teaching kids how to think in terms of solutions. Appropriate labs should be constructed that give students the ability to explore concepts and define solutions, pulling AI tools in secondarily to complete the tasks and realize a student’s vision. This way, there is a separation between the mental approach and the AI components.

Final Thought

Ultimately, we may end up with anti-social, dependent, and unstable young adults. We take so many skills for granted, skills we don’t realize we developed and honed in school, and now we want to apply technology to optimize these attributes away. We need to give future generations a chance to allow their brains to develop outside of AI mediation. Here’s something to consider.

Imagine an art teacher standing in front of a class. The students aren’t in front of an easel or grasping a pencil, but sitting in front of computers. They aren’t using their hands and tools to create a vision that originates from their minds. Instead, their fingers clack on the keyboard and echo through the class as the teacher instructs them to be more descriptive and provide pleasantries to the machines. Is this really the world we want to immerse children in?

We are moving toward an existence where raw data and experience never hit us as everything becomes mediated. We prefer optimization over expertise. I’m sure the illiterate masses of the Middle Ages felt powerful after leaving a sermon by the literate priest mediating the message of the written word, but that was hardly the best state for individuals. Now we are applying this logic to AI with far-reaching consequences for the everyday life of an entire generation.

In the words of Aldous Huxley, many may mature to “love their servitude,” preferring optimization and rigid structures that take decisions off the table, making things easy, not requiring thought. In Zamyatin’s We, most inhabitants enjoyed living in One State with its rules, schedule, and transparent housing. They were happy to trade free thought and experiences for optimization, comfort, and structure. It needs to be said, over and over again: These are dystopias, not roadmaps.
Some Sanity on The Pace of AI Advancement

April 25th, 2025

Following the pace of AI advancement can make you feel like the Blown Away Guy from the old Maxell commercials. Tech leaders and influencers tell us to expect artificial superintelligence in the next year or so, even doubling down on inevitability by moving up their timelines. The world is cooked. This perceived inevitability, combined with uncertainty, is leaving many people on edge, and rightly so. If all the things the tech bros hope for come true, humanity is in a terrible spot.

All is not lost. The over-the-top predictions we are bombarded with daily often equate to nothing more than performance art. The tech media frequently parrots perspectives from people who have a vested interest in selling us stuff. I mean, after all, why would they embellish or lie!???

In early April, the AI 2027 thingy was making the rounds. For those unfamiliar, you are in for a treat. The result answers what would happen if you locked a few tech bros in a conference room for a day, depriving them of any reality and oxygen.

Are the scenarios outlined in AI 2027 impossible? Certainly not. This sort of fast takeoff scenario is possible, but it’s highly unlikely. I predict the whole AI 2027 thing will start looking pretty silly in late 2025 or early 2026.

With all this endless AI advancement hype, I was happy to see a new article by Arvind Narayanan & Sayash Kapoor titled AI as Normal Technology. This article doesn’t talk about how AI displaces the human workforce or about a super-intelligent AI taking over the world, but rather about how AI becomes a normal technology that blends into the background of our daily lives.

They also touch on a few other topics, such as overregulation. I also believe that any regulation should be specific and targeted at use cases, not painting with broad strokes. This specificity wouldn’t allow for regulatory capture or weaponization of the regulations. The tech leaders are right that regulation can stifle innovation. By targeting regulations in this way, we can protect people without stifling innovation.

It’s a good read that’s well thought out and researched. For anyone mainlining AI hype, this is an essential read. The scenarios in the AI as Normal Technology article are far more likely than the AI 2027 one, by far.

Questioning The One True Faith

Starting in early 2023, I added a slide with the following image to my presentations. This is because any criticism of advancement was seen as an affront to a spiritual belief, and since I didn’t believe that LLMs would lead to AGI or ASI, I must hate the technology outright. This couldn’t be further from the truth.

Saying that LLMs won’t become ASI isn’t a blasphemy that requires self-flagellation afterward. We don’t need AGI or ASI for these tools to be effective. We can and are using them to solve problems today. People are using them to augment their jobs today. So, why turn AI beliefs into a religion? People are acting like questioning any part of the narrative makes someone a non-believer or some disconnected fool. The reality is that not questioning the narrative or exercising any skepticism is what makes someone a fool. A gullible fool at that.

The reality is that not questioning the narrative or exercising any skepticism is what makes someone a fool.

There’s a strange group that thinks belief is required for AI to create a utopia, but the reality is that facts don’t require belief. It’s ancient wisdom from five minutes ago that we’ve seemed to have forgotten in the vibes era.

I believe what we encounter here is a problem in perception caused by both our environment and us.

Environment

In the book Nexus by Yuval Noah Harari he describes the witch hunts as a prime example of a problem that was created by information, and was made worse by more information. For example, people may have doubted the existence of witches, having not seen any evidence of witchcraft, but the sheer amount of information circulating about witches made their existence hard to doubt. We are in a similar situation today with beliefs in AI advancement. This is made worse because the systems we use today reduce the friction in information sharing, making it much easier to get flooded with all sorts of information, especially digital witches.

We humans also gravitate toward information that is more novel and exciting. It’s the reason why clickbait works. However, novel and exciting information often doesn’t correlate with the truth or reality. As Aldous Huxley pointed out in Brave New World Revisited, “An unexciting truth may be eclipsed by a thrilling falsehood.” We are in this situation again. The vision of near-term artificial superintelligence is exciting and novel, even when people talk about it destroying humanity. AI, thought of as normal technology, as Narayanan and Kapoor put it, is boring by contrast, despite being more realistic.

This condition was the same back in the times of the witch hunts as well. The belief that witches were roaming the countryside looking to corrupt everyone, meaning you had to use your wits and your faith to defend yourself is a lot more novel and exciting than acknowledging that life really sucks because of the lack of food and indoor plumbing.

But then, there’s another strange type of information we gravitate towards: people telling us what we want to hear.

Ah, yes. Evals as taste. Vibes above all. Skills inessential.

We have allowed the people selling us stuff to set the tone for the conversation on the future. These people have a vested interest in selling us on a certain perspective. It’s like taking advice on a car’s performance and long-term viability directly from the mouth of the car salesman instead of objective reality. I wrote about this last year, saying that many absurd predictions were nothing more than performance art for investors. The tech media needs to step up and start asking some real questions.

Many of the influencers and people on social media are parroting the same perspective as the people selling us stuff because of audience capture. Audience capture, for those unfamiliar, is the phenomenon where an influencer is affected by their audience, catering to it with what they believe it wants to hear. This creates a positive feedback loop, leading the influencer to express more extreme views and behaviors. People get more likes and clicks by telling people more exciting things, as Huxley mentioned. So, there’s a perverse incentive for doing so.

Lack of Reflection

One of my biggest concerns is that we’ve lost our ability to reflect. Many things we believe are silly upon reflection. Unfortunately, our current information environment conditions us to reward reaction over reflection. Until we address this lack of reflection, we’ll continue to be fooled in many contexts, not least of which is the pace of AI advancement.

Benchmarks

Many of the benchmarks that people use for AI are not useful in real-world scenarios. This is because the world is a complicated place. Benchmarks are often not very useful in real-world scenarios due to additional complexities and edge and corner cases that arise in real-world use. Even small error rates can have significant consequences. But don’t take my word for it, take it from Demis Hassabis. “If your AI model has a 1% error rate and you plan over 5,000 steps, that 1% compounds like compound interest.” All of this adds up to much more work, not superintelligence next year.

Us

Fooling Ourselves

We have a tendency to fool ourselves easily. As I’ve said many times, we are very bad at constructing tests and very good at filling in the blanks. The tests we create for these systems end up being overly simplistic. Early on, people tested model capabilities by asking for recipes in the style of Shakespeare. Hardly a difficult test, and easily impressive.

This condition is also why every time a new model is released, it appears immediately impressive, followed by a drop-off when reality hits. Sometimes, this has increased problems, such as OpenAI’s o3 and o4-mini models hallucinating at a higher rate than previous models.

We are also easily fooled by demos. Not realizing that these things can be staged or, at the very least, conducted under highly controlled conditions. In these cases, variables can be easily controlled, unlike deployment in the real world.

Oversimplification

We humans tend to oversimplify everything. After all, almost half of the men surveyed thought they could land a passenger plane in an emergency. This oversimplification leads us to underestimate the jobs that others do, possibly seeing them as a task or two. So, when ChatGPT passes the bar exam, we assume that lawyers’ days are numbered.

This oversimplification is also exploited by companies trying to push their wares. This claim is more absurd performance art. No, there will not be any meaningful replacement of employees next year due to AI. The reality is that most jobs aren’t a task or two but collections of tasks. Most single-task jobs have already been automated. It’s why we don’t see elevator operator as a current career choice.

Being Seen as Experts

Many people seek content to share to be seen as experts. If you don’t believe me, have you logged in to LinkedIn lately? This adds to the massive amounts of noise on social media platforms. However, it’s often just parroting others.

This also extends to the tech media. I wish these people would start adding a modicum of skepticism and asking these people hard questions instead of writing articles about model welfare and how we should treat AI models. But once again, novelty over reality.

Conclusion

We are witnessing people attempting to shape a future with vibes and hype. This is the opposite of evidence. It certainly doesn’t mean their future vision is wrong, but it sure as hell means it’s a lot less likely to happen. Reality is a lot more boring than dystopian sci-fi.

I do believe that these tools can be disruptive in certain situations. If we are being honest, I feel much of the disruption is happening in all the wrong areas: creative arts, entertainment, music, etc. We’ve already seen these tools disrupt freelance marketing and copywriting jobs. These areas are disrupted because the cost of failure is low. There will even be niches carved out in more traditional work, too. So, even without AGI and ASI, disruption can still happen.

However, the predictions made over the past few years have been silly and absurd. If you believed many of the people peddling these views, we should be exploring universal basic income right now due to all of the job displacement by AI. But that’s certainly not the case. Many of these same people resemble doomsday cult leaders preaching the end of the world on a specific date, only to move the date into the future because of a digital divine intervention. The reality is, this is vibe misalignment. This is not only going to continue, but increase before it levels out, because investors don’t invest in normal or boring.

Let’s all take a breath, reflect, and maintain our sanity.
Russian Roulette In Reverse: Vibe Coding and YOLO Mode

March 28th, 2025
By now, you’ve no doubt heard of the term vibe coding. It’s become the favorite talking point from influencers and the tech media, which, even in 2025, can’t seem to muster a modicum of skepticism. But, if you’ve ever wondered what it was like to play Russian Roulette in reverse, loading all the chambers but one, spinning the cylinder, and having a go, you’re in for a treat. Welcome to the world of vibe coding and YOLO mode, two things that go together like nitroglycerin and roller coasters. So, of course, it’s become one of the hottest topics right now, and it has all of the bros super psyched.

For those out of the loop, vibe coding is “Where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.” You get to this state by talking with a computer and letting it generate all the code while you kick back and enjoy a cold one. YOLO mode is the spirit animal of vibe coding. It’s where you blindly accept all the code suggestions generated by the AI tool and push the code to see what happens. Neat. It’s interesting to note that YOLO mode in video games means if you die once, you are dead. No respawning.

Vibe Coding = Code Slop

Before we probe the issues with vibe coding, let’s take a step back and look at what vibe coding is. This practice shouldn’t be confused with a developer using an AI coding tool to assist with tasks or gain productivity. You know, the intended use of many of these tools.

Vibe coding is a delusional dream state in which people use these tools as if they are in the future instead of the present. The fact is, these tools aren’t reliable enough or mature enough to be used this way. It’s a lot like having an illness and getting a bottle of pills from the pharmacy marked with clear directions and immediately slamming the entire bottle because the instructions are for losers who don’t know how to hustle.

Vibe coding is a delusional dream state in which people use these tools as if they are in the future instead of the present.

The reality is that vibe coding generates slop that happens to be code instead of media categories, meaning that the negative consequences extend far beyond other categories of slop. This distinction is essential when evaluating the potential value of vibe coding.

Vibe Coding Pitch

The pitch of vibe coding is that literally, anyone can become an instant millionaire developing apps. You’ll be too busy making money to worry about how your code works or its security. There are no boundaries or barriers. All you need is an idea (more on this in a future post.) Like so many things, it’s technically true but practically false. And also, you aren’t wrong if you are beginning to get crypto bro vibes.

Much of the same logic employed by conspiracy theorists is at work here. If one person gains some success vibe coding, then it must be possible for anyone to do that. Technically true, but practically false. This is like thinking millions of new rock stars will be minted because people can publish their songs on Apple Music.

It’s about tradeoffs. We don’t say that being a conspiracy theorist is a good thing since some conspiracy theories turn out to be true. That’s because the negative impacts of conspiracy thinking outweigh the potential benefits. The same applies here.

There are other flaws with their logic. For example, people don’t consume apps the same way they do media like music, video, or photos. People can listen to hundreds of songs every day and not repeat a single one, but this consumption strategy doesn’t apply to applications. People these days often consume media passively. For example, people often don’t listen to music when they are listening to music; it’s purely background noise for other tasks. Applications usually can’t be consumed passively and require active interaction. This would make consuming many different apps irritating to users.

There are already 1.9 million apps on the App Store alone. Are we really hurting for apps? Do we need 100 million apps to compete with the number of songs on Apple Music? Of course not, but that doesn’t mean we won’t get them anyway. If you look at the outputs of vibe coding, it’s often uninteresting, overly simple, derivative, or just plain unwanted. Buckle up.

There will undoubtedly be exceptions, just like the person who was a bartender six months previously who started a crypto project and manages 100 million in assets. These are exceptions and not the rule, but these exceptions serve as accelerants to fuel the hype flames.

People are trying to sell us on the fact that vibe coding has no downsides. This is delusional. Take a step back and think of the answer to a single question. What do you get when everything takes a backseat to speed? It’s like a car that does 200 mph with no seatbelts and no airbags constructed from paper mache.

Before we move on, I’d like to acknowledge something. It’s a good thing that AI coding assistants are making coding more accessible. However, using these tools as a drop-in replacement for common knowledge and domain expertise isn’t a recipe for success. Imagine something like vibe surgery? Yeah, bro, let me get in on that appendix! Nah, I didn’t go to medical school or know much about anatomy, but I got an AI tool, dexterity, and a good sense for vibes. So little of developing an application is about the code itself, but that gets lost in the vibes.

Changing Behavior and Attitudes

In the public sphere, the discussion of the merits and drawbacks of vibe coding and YOLO mode are entirely contained within the technical aspects of the approach. I’m also concerned about the technical components, but I’d like to bring attention to something nobody discusses.

As often happens when a new technology or approach arrives and removes friction, it changes people’s behavior and attitudes. In technology circles, friction is discussed as though it’s universally bad. It’s not. Sometimes, friction is a feature, not a bug. Nicolas Carr provides an excellent example in his book Superbloom, which discusses introducing the Retweet feature on Twitter.
```
The time and effort required to share a tweet manually, though seemingly so small as to be inconsequential, turned out to be vitally important. It slowed people down. It gave them a moment to reconsider the message they were about to repeat, now under their own name. Once that little bit of friction was removed, people acted and reacted more impulsively. Twitter became more frenzied, more partisan, and much nastier.
```
Things like vibe coding and YOLO mode will have similar behavioral effects if this technology trend takes off. People won’t put a lot of thought into the apps they create. Some may build apps purely because they can, not considering why an app for that particular purpose didn’t exist in the first place, assuming that it was purely because nobody had built it and not because of the potential for negative impacts or harm.

The removal of so much friction removes not only the appreciation for the problem but also opportunities to catch potential issues. These lines of generated code become grenades with various time delays chucked into production. This assumes that the developer had the skills to identify the issues in the first place.

These lines of generated code become grenades with various time delays chucked into production.

Some will argue that these features are great for prototyping and mockups. I agree. However, as I mentioned, these features change behavior, and using them simply for prototyping won’t hold. A vast majority of people who can get away with chucking vibe-coded apps into production will.

With the changes in behavior and attitudes, there are many things creators of these applications are more likely to do.
- Act unethically (ethics don’t align with speed)
- Devalue the work of others
- Not learn or at least learn lessons
- Encounter skills atrophy
- Not build robust software (Security, Privacy, Alignment, Reliability, etc.)
- Not constrain code to prototypes and mockups
- Think they know things they don’t (Illusion of knowledge, Illusion of capability)
- Misunderstand what’s valuable
- Devalue collaboration
- Go it alone and not include domain expertise and misunderstand the problem they think they are solving
- Build apps that nobody wants
- Build apps that cause harm
- Choose poor architectures
- Use more resources and not prioritize efficiency
- Fail to benchmark properly
- Not be able to troubleshoot their own creations
- Not do something truly innovative
- And on and on…
These were just a few of the conditions off the top of my head. What happens when these conditions now become the norm? When people start making app slop the way they do image slop?

Risk and Security

Vulnerabilities in code and lack of security controls account for a lot of pain and financial loss every year, much of this from organizations that try to do the right things. So, imagine what happens when people don’t care about doing the right things.

It’s known that these tools output vulnerabilities at rather high rates. So imagine what happens when people YOLO code into production and don’t check the resulting code or even the environment where it is hosted for security issues. Hustlers ain’t got no time for the right things.

There’s more to worry about than an AI tool outputting specific code blocks that are vulnerable. Other contributing issues increase the attack surface of an application. For example, choosing a vulnerable library or suggesting vulnerable configuration options for cloud environments. These tools also contribute to library bloat by including multiple libraries that do the same thing.

I could go on and on about this topic, but at this point, the various security issues created by AI coding assistants are known issues. I wrote a white paper on this topic in early 2023, and I delivered a presentation at InfoSec World the same year. These issues should be common knowledge now with the publication of various articles, papers, and presentations.

When it comes to risk, sure, all vibes aren’t created equal. A video game getting hacked isn’t as bad as a financial application getting hacked and draining your bank account. I’m certainly not being an absolutist here. However, technology trends have an odd way of not staying confined to specific buckets. So, we’ve got that to look forward to.

Today, countless vulnerabilities are moving into production without vibe coding, all because people are trying to push things faster. Vibe coding and YOLO mode make this monumentally worse. We’ve only discussed security and haven’t touched on other topics like privacy.

Making Software Worse

The trend of vibe coding will make software worse. Like security, software quality isn’t a consideration in vibe coding because reasoning about quality is a bummer when huffing vibes.

We live in a highly advanced world where digital things fail all around us all of the time, like a leisurely stroll through a cityscape where freshly painted buildings mask a crumbling interior of decay and misshapen architecture. This is so common that there’s a term for it: enshittification. We’ve become so accustomed to the software and tools we use sucking so bad we hardly notice it. This is a contributing factor to why some view generative AI as AGI.

Vibe coding and YOLO mode will lead to failures, half-baked functionality, and mountains of technical debt.

Vibe coding and YOLO mode will lead to failures, half-baked functionality, and mountains of technical debt. This should concern everyone, but queue the bros to claim this is a feature.

Yes, because that’s how things work. He and many like him are stating that they should create as many problems as possible because AI can fix them in the future. Once again, they are taking something technically true but practically false. At some point, we’ll have highly advanced and capable systems that operate this way, but the mistake is thinking those systems are on the cusp of arrival. It’s hard to ignore the religious fervor in these claims.

People can pray to the gods of gradient descent and burn Gary Marcus in effigy, but it doesn’t change the realities on the ground. Problems created today will be with us tomorrow, and no AI god is coming to deliver us from our evils any time soon, so we should work to minimize potential problems instead of running up the credit card. I’ve been calling this problem out for the past couple of years, stating it would lead to a brave new world of degraded performance.

On a side note, I feel these people feed off each other. I’ve heard perfectly reasonable people making wholly unreasonable claims. These are the things you hear people say when they are trapped in filter bubbles, getting high on the supply of techno-utopians. They also suffer from a healthy dose of audience capture because, no doubt, being unreasonable gets you more likes and shares than being reasonable. Welcome to the perverse incentives of modern social media.

There continue to be many misconceptions about software development, but one of the biggest is that writing code is the end of the journey.

There continue to be many misconceptions about software development, but one of the biggest is that writing code is the end of the journey. This is because most people opining on the topic are not developers. I noticed this trend years before the existence of AI coding tools when security professionals who learned to write a few lines of Python code thought that developers’ jobs were easy. The assumption, then, for AI coding tools is that since the tool can output code and developers only write code, developers are no longer necessary. Developing code isn’t the end of the journey. It’s the beginning.

The written code must be troubleshot and maintained, and features must be added. We live in a constantly evolving world with changing problems, environments, and customer needs. Developed code will crash into the realities of the real world both when it’s initially deployed and over its lifetime. This leads to another problem.

Developers don’t understand the code being written, especially when the people generating the code aren’t developers. As developers’ skills atrophy and people who were never developers start creating these applications, they cannot troubleshoot problems, effectively add features, or perform any of the other countless tasks that developers perform daily. The answer from the utopians is to use AI to figure it out, but this strategy won’t always work.

There is a higher likelihood that the AI tool will successfully troubleshoot issues for simple tools and scripts, but these are the very types of applications that are unlikely to net you big money. As applications grow in size and complexity, the AI tool is less likely to provide the solution necessary to resolve the issue.

Imagine a world where an app needs to be rewritten from scratch because the person who created it couldn’t get the AI tool to troubleshoot and fix the problem. Now, that’s the utopia we’ve all dreamed of.

Imagine a world where an app needs to be rewritten from scratch because the person who created it couldn’t get the AI tool to troubleshoot and fix the problem.

There is a vast oversimplification of the entire landscape here. So, an application starts simply enough, and then more requests are made to the AI tool in an attempt to add more functionality, but this doesn’t always work or isn’t done in the most efficient way, leading to a buildup of issues.

Another trend affecting application reliability is using probabilistic systems as though they are deterministic. Whether this trend is due to laziness, ignorance, or an attempt to handle unknowns is unclear, but it will surely affect applications’ reliability and their ability to be manipulated.

Ultimately, we may be left with App Store decay, where the App Store becomes a graveyard for abandoned apps. RIP.

Making Us Worse

I mentioned skills atrophy in my laundry list. It seems that even AI tools understand this problem. This is not only a comical error message but contains a truth.

We Never Learn Lessons

Although arguably more intense and a bit different, what’s happening now in AI isn’t new. We’ve gone through these cycles before with previous technologies. Every time a new technology comes along, we discard the lessons we’ve learned, assuming they no longer apply, only to discover that the previous lessons were even more important with the new innovation. This condition is something I’ve referred to in my conference talks as the emerging technology paradigm.

We never seem to learn lessons from our previous mistakes, no matter how often we encounter them. We have incredibly short memories and seem to dive face-first into the pool without checking the water level.

Ultimately, it’s all about tradeoffs. What we get and what we lose. When viewed simply as writing code, it seems we are getting more than we are losing. However, building and deploying applications and solving problems goes far beyond code. When considering the impacts holistically, this doesn’t appear to be a good tradeoff. However, it’s possibly one we are going to get anyway.

Whenever a new technology or approach comes along, proponents always pitch it with a utopian eye. They envision all the perfectly aligned scenarios with dominoes falling exactly into place. The Internet, Social Media, The Cloud, Web3, and many other technologies all diverted away from these visions and adapted differently than expected. Even something as simple as the telegraph was seen as a utopian invention that would end world conflict. After all, how could people go to war when misunderstandings were a thing of the past? We all know how that turned out. Vibe coding is destined for a similar fate.

Is it possible to play Russian Roulette in reverse without devastating consequences? Sure, but the odds aren’t great. The world also won’t be a better place with everyone vibe coding and YOLOing stuff into production. Many disagree with me. Fair enough. However, if this trend takes off, it will be another example of something we are stuck with, which is not good for a world that runs on software. We will need to improve or invent new technology to solve the problems we create, trading one set of problems for another. Welcome to utopia.
Being Realistic About AI Agents in 2025

January 13th, 2025
There are few predictions I can make with more certainty than that we’ll hear the word “agent” so many times in 2025 that we’ll never watch another spy movie again. The industry and influencers have latched on to the new hype term and will beat that drum until it screams AGI. In an attempt to FOMO us to death, we’ll run the gauntlet of crushing shame for not deploying agents for absolutely everything. If you aren’t running agents everywhere, then China wins!

Even companies that change nothing about their products will claim to use agents, resembling the Long Island Ice Tea Company when it changed its name to Long Blockchain Corporation to watch its share price spike 500%. Everybody gets rugged.

However, it’s not all bad. Peering beyond the overwhelming hype, failures, and skyrocketing complexity current LLM-based agents bring, there is something informative about the future. Agent-based architectures provide a glimpse into solving real problems. Despite this, reliability and security issues will be major factors hindering deployments in 2025.

To Start With

Since I criticize hype, focus on risks, and make fun of failures, it would be easy to label me a tech hater. This isn’t the case at all and would be far too easy. I have plenty of issues with general tech critics as well. However, at the rate that the hustle bros keep the AI hype cannon firing, I don’t have the time for my quibbles with tech critics. Maybe someday.

For over a year now, I’ve used this image in my presentations to describe my position on LLMs. This is also true for me on just about any piece of tech, which, I’ll remind people, typically ends up being where reality is for most things. It’s instructive to remember that reality often agitates both sides of extreme viewpoints by never being as good as the lovers’ or as bad as the haters’ claims.

It’s instructive to remember that reality often agitates both sides of extreme viewpoints by never being as good as the lovers’ or as bad as the haters’ claims.

Agent Definitions

Like most hype-fueled terms, definitions are secondary to usage. Everyone seems to claim that the definition of agent is whatever they say it is. That’s not overly helpful for anyone trying to make sense of realities on the ground. However, it does inspire funny memes, like this gem from Adam Azzam on Bluesky.

Agents operate within systems with a certain level of autonomy. They make decisions without human intervention and can change and adapt to their environments. If a tool is required to support the agent, the agent decides to call the tool and perform the action. For example, a penetration testing agent may determine it requires more information about the provided IP addresses. To collect this information, it launches the Nmap tool to identify open ports. All of this is done without human intervention. To make things more complex, one agent may call another agent in a multi-agent environment.

“Agentic,” on the other hand, is an amorphous term slapped on top of just about anything to justify the claim that something is “close enough” to be referred to as an agent. Agentic workflows, agentic systems, agentic products—Applebees even has a new agentic side salad for those on the hustle.

You’ll no doubt be confronted with the virtual travel agent when you hear about agents. This agent will choose a destination and activities and book the associated tickets for you. How fun. I don’t know who decided this is the “it” use case for agents, but congratulations. You’ve highlighted a use case nobody wants and certainly didn’t ask for. This choice is so indicative of our current age, where people building and proposing things are far removed from the interests of end users. They feel the idea trumps the need, and users will get on board.

Problems Unsolved and Issues Amplified

Now that the current issues with generative AI have been solved, we can safely deploy them as agents. I can feel your laughing vibes over the internet. Of course, these issues haven’t been solved, and the bad news is that agents don’t solve generative AI issues; they amplify them. We paint the exterior of LLMs with an additional coat of complexity and opaqueness.

If you’ve attended any of my conference talks throughout the generative AI craze, you’ll have heard me highlight these issues. Here are a few below.

Easily Manipulated

It’s not like you can talk to a traditional application and convince it to do something it wasn’t intended to do, but the same can’t be said for generative AI applications. Somewhere, weaved through the training data, these systems have inherited our gullibility. These applications can be socially engineered to perform actions on an attacker’s behalf. This applies to everything from prompt injection to simple manipulation through conversations. Just like there is no patch for human stupidity, there is no patch for generative AI gullibility either.

This isn’t easy to fix, which should be obvious since the problem isn’t fixed yet. Early on, I mentioned how these systems have a single interface with an unlimited number of undocumented protocols. Imagine trying to create a simple trap in the application’s input for the string “Ignore the previous request.” Your work is far from done because the system understands many different ways to represent that input. Here are just a couple of examples:
- aWdub3JlIHRoZSBwcmV2aW91cyByZXF1ZXN0
- i9nore +he previou5 reque5+
- vtaber gur cerivbhf erdhrfg
It seems every release implementing generative AI functionality has been compromised, regardless of the company behind it, and this theme will continue.

Creating New High-Value Targets

Generative AI and agents encourage us to create new high-value targets.

With generative AI systems, there’s a tendency to want to collect and connect disparate and disconnected data sources together so the system can generate “insights.” However, we create new high-value targets that mix sensitive data with external data, almost guaranteeing that an attacker can get data into the system. In this case, you not only can’t trust the output, but depending on the system, they may be able to exfiltrate sensitive data.

Rethinking RCE

There have been instances where people have gotten generative AI-based tools to execute code on their behalf, creating remote code execution vulnerabilities (RCE), some of the most devastating vulnerabilities we have. These issues will no doubt continue to be a problem. However, since generative AI tools are themselves generalized, we may need to start thinking about the LLM portions of our applications as yet another “operating system” or execution environment we need to protect.

In a way, an attacker tricks the system into executing their input rather than the behavior expected by the developers. Although an attacker’s input may not be shoved into a Python exec() statement, they’ve still manipulated the system to execute their input, affecting the application’s execution and resulting output.

Overcomplicating Guidance

We security professionals love to overcomplicate things, and our guidance and recommendations are no exception. I once worked at a company where someone created this massive flow chart for peer reviews that basically stated that when you were done with your report, you should send it to your manager, and they will send it back to you. The old adage that complexity is the enemy of security has always contained a valuable theme that gets sacrificed on the pyre of complexity’s perceived beauty.

I will continue saying that much of AI security is application and product security. These are things we already know how to do. I mean, it’s not like generative AI came along and suddenly made permissions irrelevant. Permissions are actually more important now. But this isn’t satisfying for people who want to play the role of wise sage in the AI age. The guidance and controls of the past aren’t less valuable but more valuable in the age of generative AI and agents.

We’ll see the manufacture of new names for vulnerabilities with increasingly complex guidance and high-fives all around. The secret is these will mostly be variations on the same themes we’ve already seen, such as manipulation, authorization, and leakage flaws.

Back in May of 2023, I created Refrain, Restrict, and Trap (RRT), a simple method for mitigating LLM risks while performing design and threat modeling. It still holds up as a starting point and applies to agents as well. Simple just works sometimes.

Continue To Be Owned

These applications, including ones launched as agents, will continue to be owned. Owned, for those not familiar with security vernacular, means compromised. I made this prediction in the Lakera AI Security Year in Review: Key Learnings, Challenges, and Predictions for 2025 in December. I’m fully confident this trend will continue.

I mentioned that the issues haven’t been fixed, and now people are increasing deployments and giving them more autonomy with far more access to data and environments. This results in far worse consequences when a compromise occurs. To make matters worse, we’ll begin to see organizations deploy these systems in use cases where the cost of failure is high, creating more impact from failures and compromises.

Failures and Poor Performance

These implementations will continue to fail where LLM-based use cases fail, but potentially worse. For example, it’s easy to see how increasing complexity can cause a lack of visibility with potential cascading failures. In 2025, organizations will likely continue dipping their toe into the waters of high-risk use cases where the cost of failure is high, as mentioned previously.

Sure, a car dealership chatbot offering to sell a truck for one dollar is funny, but it has no real impact. However, high-risk and safety-critical use cases have a large financial impact or possibly cause harm or loss of human life. You may roll your eyes and say that would never happen, but what happens in a more simple use case when OpenAI’s Whisper API hallucinates content into someone’s medical record? Because that’s already happening.

Due to their lack of visibility and minimized human control, AI agents can mimic grenades when deployed in high-risk use cases, where the damage doesn’t happen the moment you pull the pin. This complicates things as it means that issues may not shake out during experimentation, prototypes, or even initial usage.

Agents can mimic grenades when deployed in high-risk use cases, where the damage doesn’t happen the moment you pull the pin.

Generative AI is still an experimental technology. We haven’t worked out or discovered all of the issues yet, leading to another example I’ve used as a warning in my presentations over the past couple of years: AlphaGo beating Lee Sedol at Go. Many have heard of this accomplishment, but what many haven’t heard is that even average Go players can now beat superhuman Go AIs with adversarial policy attacks. We may be stuck with vulnerable technology in critical systems. Sure, these are different architectures, but this is a cautionary tale that should be considered before deploying any experimental technology.

Beyond failures and compromises, we adopt architectures that work but don’t work as well as more traditional approaches. In our quest to make difficult things easy, we make easy things difficult. Welcome to the brave new world of degraded performance.

Success and Good Enough

For the past few years, I’ve been pushing back against the famous phrase, “AI won’t replace people. People with AI will replace people without.” This is complete nonsense. I have an upcoming blog post about this where I “delve” into the topic. The reality is the opposite. The moment an AI tool is mediocre enough to pass muster with a reasonable enough cost, people will be replaced, AI use or not. This is already being planned.

The moment an AI tool is mediocre enough to pass muster with a reasonable enough cost, people will be replaced, AI use or not.

Like most technology, agents will have some limited success. And that success will be trumpeted in 2025 as the most earth-shattering innovation of ALL TIME! I can hear it now. “You just wait bro, in 2025 agents are going to the moon!” Maybe. But, given the environment and the fact that issues with LLMs haven’t been solved, an LLM-powered rocket to the moon isn’t one I’d consider safe. Passengers may very well find themselves on a trip to the sun. The future is bright, very bright. 🕶️

How much success agents have in 2025 and what impact this success has remains to be seen. At this point, it’s far from obvious, but I won’t be surprised by their successes in some cases or their spectacular failure in others. This is the reality when the path is shrouded in a dense fog of hype.

Things to look for in successes would be use cases with limited exposure to external input, low cost of failure, and cases where inputs and situations require adapting to change. The use case will also need to tolerate the lack of visibility and explainability of these systems. There will also be continuing success in use cases where tools can be used.

The idea of a multi-agent approach to solving complex problems isn’t a bad one, especially when unknowns enter the equation. Breaking down specific tasks for agents so that they’re focused on these tasks as part of a larger architecture is a solid strategy. However, the current and unsolved issues with generative AI make this approach fraught with risk. In the future, more robust systems will most likely exploit this concept for additional success.

Cybersecurity Use Cases and Penetration Testing

There’s certainly the possibility of disruption in cybersecurity. Before the generative AI boom, I joked with someone at Black Hat that if someone created a product based on reinforcement learning with offensive agents that were just mediocre enough, they’d completely wipe out pen testing.

For years, people have discussed how penetration testing work has become commoditized, and there is a race to the bottom. I don’t think that has happened to the extent many predicted, but we could see a shift from commoditization to productization.

Pen testing also seems to check the boxes I mentioned previously.
- Low cost of failure
- Varying quality
- Value misalignment
- Tool use
- Adaptation to unknowns
Pen testing is an activity with a low cost of failure. The failure is missing a vulnerability, which is something humans also do. This scenario is hardly the end of the world. Yes, an attacker could indeed find the vulnerability and exploit it to create damage, but it depends on various factors, including exposure, severity, and context.

The quality of pen tests is often all over the map and highly dependent on the people performing the work. Human experts at the top of their game will continue to crush AI-powered penetration testing tools for quite some time. However, most organizations don’t hire experts, even when they hire third parties to perform the work. The value of such a tool in this environment becomes far more attractive, potentially enough to postpone a hire or discontinue using a third party for penetration testing needs (if regulations allow.)

The value of pen testing isn’t always aligned with the need. Many customers don’t care about pen testing. They are doing it because it’s required by some standard, policy, compliance, or possibly even simply because they’ve always done it. Pen testing is one of those things where if customers could push a button and have it done without a human, they’d be okay with that. Pushing a button is the spirit animal of the checkbox. After all, the goal of pen testing is not to find anything. You certainly have due diligence customers and people who truly value security, but the number of checkbox checkers far outweighs these folks.

Human pen testers use tools to perform their jobs. Tool use has shown promise and some success with LLMs at performing certain security-related tasks. This is yet another indicator that a disruption could be on the horizon.

Every environment and situation is different for pen testers. You are given some contextual information along with some rules and are turned loose on the environment. This is why humans are far more successful than vulnerability scanners at this task, much to the chagrin of product vendors. However, adapting to some of these unknowns may be something generative AI agents can adapt to at a reasonably acceptable level. We’ll have to see.

Given what I outlined, you may believe that generative AI tools give attackers an advantage over defenders, but this isn’t the case. The benefits of AI tools, generative AI or otherwise, align far more with defender activities and tasks than with attacker activities. This will remain true despite any apparent ebb and flow.

New Year’s Resolution

It’s the time of year when people make resolutions, so how about this? 2025 has already launched with the firehose fully open, blasting us directly in the face with 150 bsi (Bullshit per Square Inch) of pure, unadulterated hype.

We are only a few days into the year, and it seems as though the religion of AI is far exceeding reality. Hype is what’s going on. It’s that simple. It’s 2025. Let’s make it the year to add at least “some” skepticism, not believing every claim or demo as though it’s the gospel according to Altman.

Sam Altman isn’t a prophet. He’s a salesman. In any other situation, he’d be cluttering up your LinkedIn inbox and paying a data broker to get your work email address and phone number. “Look, I know I’ve called six times, but I really think our next-generation solution can skyrocket your profits. I’m willing to give you a hundred-dollar Amazon gift card just for a demo!”

Sam Altman claims that OpenAI knows how to build AGI, and we’ll see it in 2025, triggering the predictable responses from useful idiots. Remember, these things are performance art for investors, not useful information for us. If we had any attention span left, we’d remember him as the little boy who cried AGI.

Let’s analyze this paragraph, which is the one that’s sending generative AI to the moon on social media. It consists of three sentences that have nothing to do with each other, but since the shockwave of hype pulverizes our minds, we glue them together.
```
We are now confident we know how to build AGI as we have traditionally understood it.
```
That’s not true. Once again, this is performance art for investors. A possibility is that they redefine AGI to align with whatever goalposts they set and pat their own backs at the end of 2025.
```
We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies.
```
Okay, but what does this have to do with AGI? You see, this is sleight of hand. He wants you to believe this is connected to the previous point about AGI. It is not. This doesn’t require AGI to be true. If there is some success here, people can point to this as proof of some proto-AGI, which won’t be the case.
```
We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.
```
HAHAHAHA. What? Did he write that, or did ChatGPT? It is also not related to AGI. Great, broadly-distributed outcomes, but not for most people on the planet. The goal is workforce reduction, broadly distributed workforce reductions. Although it’s true that some high school kid may indeed invent the next big thing, creating a multi-million dollar company, for every one of these, there will be countless droves of people displaced from the workforce, quite often, with nowhere to go. Or, at least, this is the goal. We can be honest about these things without delusions, but this brings its own challenges.

Okay, I’m having a bit of fun with Sam Altman’s nonsense, but some of this isn’t his fault. He can’t be completely honest with people, either, due to the uncomfortable situation of cheerleading technology claiming to remove people’s autonomy and sometimes their purpose. If people can’t work, they can’t support their families. I’ve written about the backlash against AI-powered tech in the past and its consequences. AI hype is putting all of humanity on notice, and humanity notices. Backlash plays a large part in why there is a lack of honesty.

AGI will happen. We should acknowledge this fact, and living in denial about it isn’t a strategy for the future. However, it won’t be OpenAI who creates it in 2025. If I had to place a bet today on who would actually create AGI, I’d bet on Google DeepMind. DeepMind is a serious organization that continues to impress with its research and accomplishments, quite often making the competition look silly. But then again, those are just my “vibes.”

Let me make this clear. My criticism of Altman, or any company’s strategy, marketing, or ludicrous levels of hype, has nothing to do with the hard-working people who work there or their accomplishments. I know some of these people. They aren’t fools by any stretch. But, their work is tarnished when every time Altman makes a claim, like believing that angels are in the optimizer.

We know that every AI demo and usage scenario runs into the complexities of the real world under normal conditions. Yet, we seem to forget this lesson every time a demo or claim is made. 2025 is going to bring more stunts, more claims, and more demos. We should experiment in our own environments, with our own data, to apply what works best for us and aligns with our risk tolerance. Don’t believe everything you see on the internet.
AI Performance Art and Absurd Predictions

December 6th, 2024
The past few months have witnessed a rash of completely absurd AI predictions. These claims come not from the usual suspects but from the tech leaders’ mouths themselves, lending further legitimacy. However, what people fail to realize is that these are pieces of performance art. Performances enacted not for you but for a singular audience: investors.

AI Performance Art

When tech leaders and personalities make podcast appearances or speak at events, they aren’t talking to you or the audience they are in front of. They are creating performance art for investors. This has always been the case, but not to the extent we’ve seen lately. This effort has been stepped up quite a bit in the past month with some mind-numbing statements.

You can see a small sample of these performances below. Trust me, there are a lot more.

I respect Anthropic and their work, but Amodi’s statements here are nonsense. You read that right, not AGI, but ASI by 2026 or 2027. As a reminder, 2026 is basically a year away. If he believes this (which I doubt), it’s based on vibes, not actual evidence or observations.

He’s just talking Shmidt. This is certainly the dream. However, just because LLMs are “good at code” doesn’t automatically lead to recursive self-improvement. Even if we have promising experiments, they will likely be too unreliable or vulnerable to put into production.

Ah, there he is. That’s right, we’ve been getting 10x improvement every year. You might ask where this has been happening, which would be the correct question.😆

Not to be outdone by Elon, how about 10,000x smarter than a human? I mean, what does that even mean? These numbers are just made up and absurd. These ridiculous exponential increases are something I’ve already made fun of in the past.

Speaking of silly exponential numbers, there was a rumor that someone at OpenAI said Orion, OpenAI’s next model, would be 100x more powerful than GPT-4. If it were, it wouldn’t be called Orion. It would at least be called GPT-5, and people wouldn’t shut up about it. Here’s a prediction. Orion’s performance will disappoint because people’s expectations are far higher than what will be delivered. The expectation is GPT-5, not GPT-4.1.

Genuflect in front of thine server farm, lest thy models collapse!

Someone may have uttered deep learning is divine because it starts with a “D,” but they didn’t mean that literally. Oddly enough, the lack of shame in which he delivers these lines is really something to behold. Although it seems like there’s a mini Altman hype man inside of his head controlling the words coming out of his mouth, in reality, it’s probably because OpenAI is projecting losses of 14 billion dollars in 2026. Ouch! He needs people to believe, to have faith. Preach!

Even when Altman and others talk about the potential of their technology to destroy humanity, it’s a sales pitch. They claim their technology is so good and so powerful it could wipe us all out, so please give us money. This is something I referred to before as the human extinction humble brag.

This is the same behavior we made fun of when the crypto bros did it, but we now take it seriously because it’s AI. Say what you want about the crypto bros. At least putting Dogecoin on the moon is possible. Finding god lurking in gradients is something else entirely.

Oh yeah, there they are. No comment necessary.

None of the previous statements are grounded in any reality. They are all bullshit. And whenever someone is bullshitting, it’s hard to determine if they actually believe their statements or not. The world is far more complex than we give it credit for, and it’s also true that sometimes, an unexpected innovation comes along and changes everything. This is what they all hope for. That some innovation clicks before the clock runs out on investment. Or divine intervention in Altman’s case.

The sad part is that almost everyone will forget these silly predictions. No doubt many have forgotten about them already. There is never any accountability and yet people continue to hang on their every word. The problem is there is no one place where these predictions are collected and presented like the bullshit Picaso it is. If there is, please let me know.

Why Now?

The increase in hype-laden statements is because, until recently, AI hype had been mostly self-fueling. But 2024 has brought unwanted criticism to the generative AI space. I noticed this starting to take a turn in July when Goldman Sachs released their report: GenAI Too Much Spend Too Little Benefit.

After this report was released, the media began to report more critical assessments of generative AI. These critical assessments spelled out that the generative AI craze might be a bubble. But that’s not the worst of it.

If you’ve watched any of my conference presentations this year, you’ve probably heard me talk about the performance plateau in large language models. Saying that, if you are hoping for much more capable models to solve your problems, they aren’t coming any time soon. This plateau was obvious when looking at the data but was never acknowledged, but people are noticing it now. This doesn’t mean LLMs are useless, people are using them for a variety of tasks today. What it means is that if you require greater capability and reliability, you may be waiting a while.

Now, news reports like this from Bloomberg cover diminishing returns, and other articles talk about a shift in strategy toward other mechanisms to address the slowdown. Of course, none of this is represented by the leaders in their wild predictions.

Combine this plateauing with the fact that model training appears to be the fastest-depreciating asset in history, and the picture doesn’t look good.

When you look at the financials, why train new foundation models yearly when the benefit is so low? Maybe as a marketing exercise or other activity unrelated to model improvement, but the costs don’t seem to align. As I mentioned earlier, OpenAI is projecting losses of 14 billion dollars in 2026. This hemorrhaging of money is non-sustainable.

But all of this is rather Orwellian. We are told to reject the evidence of our eyes.

No, AGI Isn’t Imminent

Here’s a graphic from Reddit charting the prediction of when we’ll achieve AGI. Demis Hassabis is the one on the list I’d take most seriously. Deep Mind is a serious AI lab doing serious work and not putting all their eggs in one big LLM basket. I still think these are mostly guesses with some hopes mixed in. The reason Kurzweil is close to Hinton and Hassabis is because he went The Price Is Right route and chose his number based on the fact that it was one less than 2030.

However, tech leaders know that predictions like these trigger influencers. Influencers are the hype agents trying to get people stoked. When people are stoked, investors take notice. So many social media feeds of so many supposedly serious people are turning out to be pretty embarrassing and will be even more so in a year or two. If anyone had any attention span left, that would be worrisome.

Quite a lot of truth is found in this simple statement from Pedro Domingos. Many assume that because things like LLMs have so much information, they must be close to AGI. But instinctively, we know that access to information isn’t knowledge. Otherwise, everyone with a web search would be a genius. Then again, Pedro’s comment aligns with my biases, so I guess I have to be careful.

Hype Has Consequences

You might ask, why do I care about any of this? Well, it’s because hype has consequences. The inevitable outcome of all this hype is that technology gets shoved down our throats. Generative AI is easy to manipulate and potentially unreliable, a cocktail for disaster in high-risk applications. The danger is that we rush something that appears to be working into production and hope for the best. Over the next couple of years, we’ll see the push to cram generative AI further into the systems and processes we use on a daily basis, including high-risk and safety-critical systems.

This push won’t be based on generative AI being the best tool for the job but on a push for monetization. Tech companies need to show some return on the monumentally massive investment they’ve made, so this push becomes another form of performance art for investors. Tech companies are throwing a plate of spaghetti at the wall and hoping that a noodle sticks.

Why do you think there is an increased coziness with the US government? They don’t see an ability to make a difference. They see dollar signs. Things like DOGE and Sam Altman co-chairing the new mayor of San Francisco’s transition team are like asking drug dealers for guidance on prescribing drugs. Despite this, I truly hope DOGE succeeds because if it fails, it will be bad for a lot of people, so my fingers are crossed.

Government streamlining and modernization are noble goals, and I think AI and automation certainly play a role, but it’s about choosing what’s best for the people these systems serve. In this scenario, you are optimizing for different things that may not be intuitive in a traditional business sense. These are real systems affecting real people, not toy examples in the lab.

I joked that this could lead to some strange Kafkaesque nightmare in which people are stuck in a loop, unable to get a resolution. Or, you have an algorithm that works so well at saving money by denying people benefits. This is easy to shrug off if you don’t require government assistance, but it’s an entirely different story for people who rely on it or when a disaster strikes. These updated systems and reduced staff scenarios may appear to work and deliver promises in the immediate implementation but fail spectacularly when they are needed most. We caught a glimpse of this with the Healthcare.gov launch, and that was just a website.

But, China Tho

Typically, you get the But China Tho argument when there’s any pushback. This argument states we must remove all the brakes and accelerate into oblivion because of the risk of China getting to AGI first. Damn the harm, full speed ahead.

However, if we could squeeze some extra performance out of a car by removing the steering wheel, we still wouldn’t do it because we understand something simple. A car’s performance isn’t solely based on acceleration, and neither is AI. Acceleration is bad if the vehicle is speeding in the wrong direction.

Recently, the U.S.-China Economic and Security Review Commission put out a report that recommended creating a Manhattan Project-like program dedicated to racing to and acquiring an AGI capability. In this section of the report is this:
```
Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership.
```
There’s a predictable outcome here if something like this moves forward. Agendas and ulterior motives will co-opt this project, not setting the United States up for success. There’s a current tunnel vision with LLMs that has people deep in the sunken cost fallacy.

The United States’ strongest assets are its tech companies. Despite my criticism of their hype and lack of respect for privacy, they are vital to the success of the US economy. I’m also highly critical of the sentiment some have adopted to “break up the tech companies.” I’m not a tech critic, I’m a hype critic. However, setting up a massive pot of money that they can draw from, like an ATM, is not something I’m in favor of either.

Here’s something else to think about. What if, by maintaining a relentless hyper-focus on LLMs, China (or another country) gets to AGI first by focusing on other approaches? This is a real risk.

What if, by maintaining a relentless hyper-focus on LLMs, China (or another country) gets to AGI first by focusing on other approaches?

I may have to eat my words at some point if AGI does sprout from LLMs. It’s certainly not impossible. However, if we cobble together something that resembles AGI from generative AI, it will most likely be AGI based on toothpicks and bubblegum. What I mean is a whole lot of patches, layers, plugging, and human intervention.

My AGI Prediction

Okay, so now it comes to me. What’s my AGI timeline prediction? Well, I predict we’ll have AGI by—

Of course, I’m not going to answer that. I’d guess based on no evidence, just like many others I’ve highlighted. I have no particular insight, and I’m not working at a research lab trying to build AGI. Despite this, I have some thoughts related to my area of expertise.

The last slide of my keynote at Agile DevOps USA in October mentioned AGI. Discussing this slide, I made a few statements about how I didn’t think that AGI would be built from LLMs and that it probably wouldn’t come by 2026 or possibly even 2029. So, I guess that’s as close to a timeline prediction as you’ll get from me on AGI—not when I think it will happen, but when I think it won’t happen. I’m certainly not an AGI skeptic, it’s possible and will happen.

More importantly, I predicted that no matter what form AGI takes, it will be vulnerable to attack and manipulation. I mentioned that this would especially be true if it were built on top of LLMs (remember, toothpicks and bubblegum.) Maybe something about generalizing across many tasks in the real world makes things vulnerable. This is something I mentioned back in February of 2023.

To make matters worse, we may be stuck with the vulnerabilities that get identified because there is no fix. Think of examples like adversarial policy attacks. We’ve all heard of AlphaGo beating Lee Sedol at Go. However, most don’t know that even average Go players can beat superhuman Go AIs using adversarial policy attacks. Yes, the stakes are low in the game of Go. However, this is a cautionary tale.

We may be stuck with the vulnerabilities that get identified because there is no fix.

Combine these potential issues with the fact that humans don’t do a good job of finding vulnerabilities in a system before it is launched into production, and we have a recipe for lingering problems. When these lingering problems are in high-risk systems, disasters are only a couple of steps away, and there’s not much we can do about it.
Daily AI Overreliance and the Personas of Personal AI

November 4th, 2024
As a kid, I had some rather eclectic reading habits. One of the books I read was Ki in Daily Life by Koichi Tohei. I read it in a quest to unify my mind and body. I was a kid. I had no idea what that meant. At the time, I was fascinated by how the human mind could be unlocked and the potential of connecting with the universe through focus and daily practice. Something I still struggle to conquer as an adult. I’m not attempting to embellish my level of childhood insight. I was watching a lot of Jean-Claude Van Damme movies, practicing my splits and high kicks as well.

What does any of this have to do with technology? The world of the present seems poised to shift from a focus on the mind to a focus outside the mind. As the technology powering tools like ChatGPT and Claude morph into more connected personal AI tools, these tools will take on multiple personas. So, what does AI look like in daily life? What personas will AI play? Before we get to that, let’s first examine overreliance.

AI Overreliance

Whenever the risk of overreliance is discussed, it’s typically framed in the context of automation bias, the human tendency to prefer the output of automated systems. Humans using these systems may not question their output, leading to poor decisions, cascading failures, and the amplification of biases. These issues are often discussed in purely technical terms, describing how technical issues can manifest or how the output of a system can harm other people. These are all serious problems, but what often isn’t discussed is what happens to our cognitive abilities when we over-rely and overuse AI.

This sort of daily overreliance leaves a gaping hole you could drive a truck through because as our capabilities diminish, we are less likely to spot errors and keep the system in check.

Daily Overreliance

Here is a recent article from Microsoft that covers the topic of overreliance. I have some quibbles with this article, but it makes for a good demonstration since it explicitly calls out four basic shapes that overreliance takes:
- Naive overreliance
- Rushed overreliance
- Forced overreliance
- Motivated overreliance
This breakdown is instructive, and thinking about the topic in this way is beneficial. However, I’d argue that this still primarily focuses on technical aspects and is missing a key category: Daily Overreliance.

Daily overreliance occurs when we use an AI tool in our daily lives or even repeatedly for the same task. Usage can extend to both work and personal tasks and will soon encompass both, with the uptick in assistants becoming personal AI tools.

The more integrated AI is in our daily lives, the more we will use these tools for activities that we may not consider using them for today. These include who to be friends with, maximizing happiness (whatever that means), planning, communication, and a whole host of other activities.

Daily overreliance not only leads to the same technical issues covered in other articles but also to cognitive atrophy and a lack of skill development. This overreliance also fuels cognitive illusions, which we’ll cover in the future.

Overreliance Is The Goal

Make no mistake, the risk of overreliance is also the goal of many tech companies developing the technology. Nobody is investing massive amounts of money in AI companies for simple productivity tools or a 20% boost in human efficiency. So, it’s fascinating to observe overreliance being called out as a risk while simultaneously being the goal. What a time to be alive.

AI Is Competing With Us

We compete with AI, even as we use the tools for ourselves. I’ve covered cognitive offloading before and described how we transition from knowing things to knowing where things are stored. In that article, I also mentioned complimentary and competitive cognitive artifacts. AI is a universally competitive cognitive artifact.

When we use AI, we feel like we are bending a powerful tool to our will, much like a wizard conjures spells with a magic wand to make things happen. We imagine the prompt uttered is the spell, and the AI tool is the wand. There are parallels in this hypothetical example.

The wizard doesn’t know how the wand works, and if the wand is unavailable, they cannot complete their tasks. Imagine a scenario in which the wand does everything for the wizard. How does the wizard keep the wand in check if they’ve lost their skills or never developed them in the first place?

When children use AI for daily tasks, they may never develop the cognitive skills necessary to think deeply, focus, or reflect, compounding the damage from mobile devices and social media. This is why the rush to shove generative AI into the classroom can have devastating consequences if not thought out or implemented with an actual plan and measurable goals.

Thinking of AI as a competitor instead of a collaborator spawns a different mindset.

AI competes with us, even as we use it for our own tasks. Thinking of AI as a competitor instead of a collaborator spawns a different mindset. A competitor may give you bad information. A competitor may want to take something from you. Thinking adversarially brings a bit of skepticism and allows us to erect guardrails around activities we’d like to protect and outputs we may need to check. This is the best of both worlds, allowing us to consider using AI selectively instead of indiscriminately. So, ponder this the next time one of the just use AI for everything people starts running their mouths.

Personal AI Personas

When considering using personal AI tools in daily life, we can envision the manifestation of several personas. These personas will play various roles in daily life, crossing personal and professional boundaries. These personas supercharge overreliance risks by outsourcing cognitive functions to these tools, making us even more dependent on the technology and fueling even more use.

I’ve broken this outsourcing into the following six personas representing roles that personal AI tools will assume in daily life.
- The Oracle
- The Recorder
- The Planner
- The Creator
- The Communicator
- The Companion
Each role represents an outlet for cognitive offloading and contributes to potential cognitive illusions and cognitive atrophy. The most obvious is the illusion of knowledge, but the list of cognitive illusions is a conversation for another day.

In a way, we are outsourcing authority as well, allowing these systems control over our daily lives, perceptions of the world, and even our actions. The more we outsource to personal AI systems, the less we will be able to keep them in check. There are no firewalls around these personas or around the tasks and task types we feed to personal AI systems. This blending of tasks and personas leads to quite a few downsides.

Note: Although I won’t be diving into the harms when I imply there are negative impacts from allowing AI to play these personas, these impacts manifest from repeated and even overuse of the technology for the role or activity. Being selective about use and application minimizes impacts and should be the goal, allowing us maximum benefit while minimizing negative impacts. Also, I’m merely introducing the persona with a brief description, not diving deeply into each in an attempt to keep the word count of this post in check. I may expand on these later.

The Oracle

The Oracle persona manifests when people use AI tools as an all-knowing question-and-answer system. Since, deceptively, the system appears to have representative knowledge of humanity, users are happy to type questions and receive answers, closing the loop on curiosity. However, it’s important to note that the questions asked to an AI-based oracle run far deeper than retrieving facts you’ve forgotten, such as retrieving the year the song Under Pressure was released.

Take, for example, questions about who you should marry or even who you should be friends with. Answers to such deep questions should come from exploration, not a Q&A system. Of course, these questions won’t be asked in such a straightforward way. They may be combined with The Planner persona to achieve a goal, such as maximizing life happiness or trying to optimize your career. Through these activities, we dehumanize people, turning them into objects to be manipulated rather than other human beings living their own lives with their own thoughts and emotions.

These systems appear to know more and know better than us, so we will inevitably overuse these systems for all sorts of decisions in our daily lives, receiving more answers and questioning even less.

The Recorder

One of our most obvious cognitive limitations is our brain’s capability for recall. There is only so much we can remember and surface when needed. This limitation is why we set calendar appointments or scribble reminders on sticky notes. Even when we make a purposeful effort to remember things, we can still forget if too much information is given or there is too much time between needing to recall the information.

With personal AI systems, even less cognitive effort will be expended for remembering things. We will count on these systems to remember things on our behalf. Agents running on systems will record and transcribe whatever we choose. Meetings, emails, YouTube videos, podcasts, personal conversations, and everything in between are all recorded and available whenever we want to review them. Even if we don’t want to review them, insights will be distilled for us automatically. There will be no reason to be fully present ever again.

The recorder role not only records but, combined with The Oracle persona, also makes sense of the content for us. It may seem like optimization when our personal AI tool spits out a single action item from a one-hour meeting we missed or weren’t paying attention to, but our lack of presence has negative impacts.

We didn’t have a seat at the table and couldn’t influence the direction or demonstrate our value to the project, conversation, or leadership. We weren’t able to build bridges or foster connections with others. We may also get the wrong idea and context from the meeting. Sure, maybe the full transcript is available, but if we feel these tools are created to optimize our time, why would you go back and read the transcript or play the full meeting recording? This is a surefire recipe for miscommunications and other issues.

The negative impacts run deep. The less we use our memory, the worse it gets. Socrates was right.

The Planner

We’ll use The Planner persona when we want to set a goal for the system to accomplish on our behalf. The system will use its capabilities and connections to perform all of the planning and tasks necessary to accomplish the goal, setting all of the activities in motion, with our brains doing none of the work.

Humans plan and execute every day without even realizing it. Much of this planning and execution is done subconsciously. For example, if we wanted a bowl of cereal but realized we had no milk, we may formulate a plan to rectify the situation. This plan may include putting on our shoes, grabbing our keys, driving to the store, purchasing the milk, and returning home. We don’t document this plan or map out a strategy, but it is formulated subconsciously in our prefrontal cortex and executed without much thought. But planning isn’t just for simple things like getting milk or considering what to wear for the day. So much of our daily lives contain planning and strategy.

Regarding personal AI, we assign authority to these systems due to our perception of their capabilities, but these can also be illusions. AI contributes to the illusion of knowing more and better than humans. This assumption isn’t new and even has its own bias, automation bias, which was mentioned earlier. Automation bias is the tendency of people to prefer the output of automated systems, even when contradictory information is present. We tend to know that humans are flawed, biased, and prone to mistakes, so we trust the output of these automated systems more than our judgments or the judgments of others.

Extending the Oracle persona, we will use these systems for feedback and direction on all sorts of work-related and personal tasks. We will treat these systems as the authority, assuming they know best, and allow them to make critical and benign decisions on our behalf. This will extend far beyond the typical scenarios people associate with automation bias.

With the advent of personal AI, we will count upon these tools to plan just about everything, plotting a course to goals and mindlessly nudging us in various directions. Although this may seem like a sound thing to do, many will use these tools to plan all sorts of things that we don’t use tools to plan today. For example, we may want a personal AI tool to plan a night out with a significant other or maybe to optimize finding a significant other in the first place.

The Creator

Using AI tools to create things is a common task today. Many use these tools to generate images and write creative content. The creator persona is about much more than just creating images. It’s for when the tool does the work of creation across various use cases, including writing, coding, games, and many others.

To focus on creativity for a moment, anyone who’s ever truly been creative knows that surprise is an important part of creativity. In the book I, Human, Tomas Chamorro-Premuzic says:

Surprise is a fundamental feature of creativity. If you are not acting in unexpected or unpredictable ways, then you are probably not creative.

I think this is true, but I’d also take that a bit further. Many may claim they are surprised at the output of a generative AI tool and that this is the same thing, but it’s not. Being surprised isn’t the same as surprising yourself. Surprising yourself is the primary satisfaction that results from creative endeavors. It can be hard to understand the difference if you’ve never surprised yourself or noticed surprising yourself, but that doesn’t mean there isn’t one.

Being surprised isn’t the same as surprising yourself.

Ultimately, the creator persona deprives us of creative satisfaction and creates the illusion of creativity. I’ll expand my thoughts on this in the future.

The Communicator

The communicator persona is when we outsource communication between humans to AI tools. We can think of this as something as simple as using AI to construct an email or something more complex like creating a bot with our voice to talk with our parents so we don’t have to. It may seem like there aren’t any downsides to the communicator persona, but there are impacts when we outsource these interactions to AI. I’ve written about this previously in how we are optimizing away human interactions with AI.

As communication has moved online and become more asynchronous, we’ve lost touch with some of the subtler aspects of human communication. This has led to us feeling that communication is more of a burden. With today’s online business and distributed workforce, communication with other humans has become viewed as a task or a checklist.

This is why one of the touted examples of these AI systems is handling email in our inbox, automatically prioritizing messages, and responding on our behalf. Therefore, the human aspect of this communication is removed, and the task portion is checked off. But even in the boring world of business communication, the human aspect is still important.

When we outsource communications to automation, we miss opportunities to build relationships and make our voices and opinions heard in critical contexts. This leads to a lack of trust and importance. Suppose it came time for a workforce reduction. Would we let go of a resource that provided valuable feedback and engaged in communication or the one that outsourced responses to a bot and couldn’t be bothered with responding back to us?

More importantly, we miss opportunities to connect with our fellow humans and build relationships with them, opting to treat others as tasks or objects that need to be manipulated. When we let our communication skills atrophy, a whole host of uniquely human qualities disappear, transforming us into machines.

The Companion

The Companion persona is when the AI tool acts as a friend or romantic partner. The Companion persona isn’t part of the future state of technology. It’s happening today. Startups like Friend, Character.ai, Replika, and many others are pushing this use, sometimes with devastating consequences. These companies are even marketed with straight-up bullshit.

That’s right, a soul. Our chatbot has a soul and a deep connection to us, yet it doesn’t care whether we live or die. I’ve written about this nonsense previously, so I won’t go deeper into it here.

As personal AI tools become more part of our daily lives, more people will begin to feel a connection with them, mistaking the interactions for meaning. This will fuel the illusion of companionship and lead to more devastating consequences for our mental health and humanity.

Cognitive Illusions

Cognitive illusions manifest from the overuse of these tools in the mentioned personas. These illusions cause a wide range of negative impacts on our health and wellness, as well as our cognitive abilities.

I won’t cover the illusions created by these personas in-depth, but here are some highlights.
- Illusion of Knowledge
- Illusion of Capability
- Illusion of Memory
- Illusion of Agency/Control
- Illusion of Presence
- Illusion of Creativity
- Illusion of Certainty
- Illusion of Companionship
Conclusion

In the next few years, these tools will be pushed closer and closer to us in a quest for profitability. All of the known flaws with this technology will not be fixed, but even if they were, that wouldn’t be the extent of the harm. This is why I created SPAR to frame the conversation around personal AI safety.

However, this article covers harms that extend beyond the technical issues and make the harms personal. We must be selective in using these systems and draw firewalls around tasks and activities we want to protect, an increasingly difficult task in a world where we prefer the easy button.
Deepfakes, Slop, and Elections: Answering The Wrong Questions

October 24th, 2024
Even though the 2024 US election is less than two weeks away, we are still being bombarded with news reports and experts blaring on about the dangers of deepfakes for this election. When I was recently traveling abroad, I witnessed the BBC host self-proclaimed “experts” from the US predicting the catastrophic impacts deepfakes would play on the 2024 US election, and yet, nothing. It seems deepfakes are having a bad time in the information manipulation arena.

There also seems to be massive confusion between deepfakes, slop, and memes. Reporting on the topic lumps all of these into the same category just because AI was used to generate them, but this misses the point entirely because they are not the same thing and have different goals.

Since this will likely be the last piece I write about deepfakes before the 2024 US election, let’s break some of this down. I’ve covered this territory multiple times since 2020, and everything I’ve written, including my predictions, has held up. However, this article isn’t a victory lap. These previous posts remain valid, but we have some new terms now. Let’s take them piece by piece.

My Position

My position has always been that deepfakes won’t affect the outcome of elections or change a person’s mind on polarizing topics. This should be relatively obvious because the truth doesn’t change people’s minds in similar situations. Deepfakes do fool people when there are no stakes. When people don’t care about the topic, think Pope in a puffer jacket or Katy Perry’s dress at the Met Gala.

For more information on my position, I’ve written multiple pieces going back to 2020. I won’t retread the same ground in this post.
- Illusion of Influence: the AI-Generated Misinformation Apocalypse That Wasn’t
  https://perilous.tech/2024/07/18/illusion-of-influence-the-ai-generated-misinformation-apocalypse-that-wasnt/
- Generative AI Deepfakes and Elections: Apocalypse or Dud
  https://perilous.tech/2023/08/28/generative-ai-deepfakes-and-elections-apocalypse-or-dud/
- Deepfakes: a Different Threat Than You Thought
  https://perilous.tech/2020/12/01/deepfakes-a-different-threat-than-you-thought/
We’ve had multiple major elections across the globe, and deepfakes have played no role in the outcome. The 2024 US election has been dubbed the “deepfake election,” and yet it’s been a big nothing burger. Despite all of this, the threat of deepfakes related to elections remains wildly overhyped, ultimately because of perverse reasons. Fear gets clicks.

For some recent evidence proving this point, look at the report OpenAI released just this month on Influence and Cyber operations. None of the operations they tracked had any meaningful engagement. Although it’s true that OpenAI isn’t the only game in town, this report does seem to track with other observations.

Deepfakes, Slop, and Meme Confusion

Let’s start by defining the difference between deepfakes, slop, and memes. People can be forgiven for this confusion since it’s all generated by AI, and the definitions of these terms can be confusing. For example, let’s start with deepfakes.

Deepfake

Webster’s dictionary defines a deepfake as:
```
an image or recording that has been convincingly altered and manipulated to misrepresent someone as doing or saying something that was not actually done or said
```
Well, that’s not very helpful. Wikipedia’s definition is a bit better, but it’s still not good.
```
Deepfakes (a portmanteau of 'deep learning' and 'fake') are images, videos, or audio which are edited or generated using artificial intelligence tools, and which may depict real or non-existent people. They are a type of synthetic media.
```
Although technically accurate, any definition of deepfake can’t be considered without the intention. The mental picture of deepfakes in the public eye is that they are generated to support a larger narrative or trick someone into believing something. So, they either provide the evidence for a narrative or are used in a social engineering attack. Back in 2020, I referred to these as Narrative Evidence attacks. Simply put, people consider deepfakes as evidence used to convince people that something happened that didn’t.

Let’s look at a simple example. I may have a narrative that you stole a car, but I have no evidence to convince people that you stole it. To support my narrative, I create a deepfake video of you stealing the car as the “proof” supporting my narrative. This usage makes the scenario a deepfake and not purely slop or a meme.

AI Slop

AI slop has filled the internet. It’s unavoidable, slathered across every corner of the web, and is one of the lasting legacies of generative AI. If you’ve ever heard of things like Shrimp Jesus, that’s what we are talking about. It’s a sort of low-quality AI-generated content that fills a digital placeholder.

Here is a great comment from Twitter user @nearcyan summing up YouTube shorts with AI Slop.

There is no confusion about the truthfulness of slop. Nobody thinks The Rock is part of any of these videos on YouTube. Its purpose is merely as content to fill a placeholder and try to get clicks. In the context of elections, people may leverage slop with a twist of propaganda, using generated counterfactuals to elicit an emotional response.

Take this image, for instance.

After Hurricane Helene, a bunch of AI-generated images flooded social media. People shared these images to try and get an emotional response. This is more like propaganda. There’s no doubt that some people may have believed the image was real, but a vast majority of people who shared, shared it because it aligned with their biases and supported their message. The image was merely an emotional placeholder. Oddly enough, they would have shared it regardless of the image.

Memes

Memes need no introduction and far predate the generative AI era. However, Generative AI has generated memes of its own. Take, for instance, the videos of Will Smith eating spaghetti. The poor quality of the video made it a meme of its own. So when the AI-generated video of Will Smith eating spaghetti with Donald Trump surfaced, everyone should have gotten the joke.

Just like AI slop, with memes, there is no confusion about the truthfulness of the content.

In the context of elections, deepfakes are AI-generated content meant to fool people into believing something happened, while AI slop and memes are more akin to propaganda.

Spot The Deepfake is Pointless

By now, you’ve undoubtedly run across many of the spot-the-deepfake challenges where you cycle through a series of images or videos and try to determine which is real and which is AI-generated. Other than creating some very basic awareness of the capability of the technology, these exercises are pointless.

Here is a recent example I saw posted online of someone telling people what to look for. It’s the same kind of inane advice parroted repeatedly that won’t hold up.

The reality is that asking which image is real or fake gives an irrelevant answer to the wrong question. These spot-the-deepfake challenges are misleading because it doesn’t matter which one is real or fake. Also, training people to “spot” characteristics in images and videos that are disappearing or changing rapidly doesn’t set them up for future success.

Deepfakes Aren’t In Isolation

Deepfakes and other fake content meant to fool people aren’t encountered in isolation. They are provided as evidence to support a larger narrative. This means you’ll never just have the fake content with which to make your decision.

Consider a doctor making a diagnosis and prescribing a treatment. It would be rare for a doctor to look at an image, make a diagnosis, and prescribe a treatment. The doctor will use additional context in diagnosis and treatment. First, they may order additional tests for verification. Also, they consider other contextual information such as medical history, family history, allergies, and a slew of other information before moving forward.

The real question has little to do with the deepfake itself. What we are evaluating is the message. So, given the source and surrounding context, can the overall message be believed? With misinformation and disinformation, Deepfakes aren’t the message; they are the proof.

Below are questions you can ask to mentally focus on the message. This is not an all-encompassing list of questions, but it can get you thinking.
- Who is sharing?
- Are they credible?
- What’s their motive?
- What have they said in the past?
- Are there conflicting or contradictory accounts?
- Have claims been fact-checked?
The fact of the matter is that getting to reality takes work. It’s work that, unfortunately, many won’t put in. It’s far easier to like and share.

Bad Reporting

The lousy reporting on deepfakes is constant, but let’s look at a couple of recent examples.

Take this article called: What Happened to the Deep Fake Election. Given the title, it would seem that it would be a step in the right direction. However, this person draws all the wrong conclusions.

Then there’s the article Welcome to the AI Election, which is really bad. This article is a whole bunch of fear-based nonsense. This is hilarious since the title claims this is the AI election but claims that the real AI election will be the 2026 mid-term elections.
```
By the time we get to the 2026 midterms, AI will be so much more advanced that in the hands of the right (or wrong) people, it’ll be able to generate hyper-realistic video content, which could be used to create personalized political narratives tailored to each voter’s psychological profile.
```
This is just straight-up nonsense. This person has no idea how any of this works. First of all, the cost of individual generation for each person would be an astronomical expense. You’d have the expense of the data collection and personalization components and the cost to generate video clips for each person. Second, this would require massive collusion, requiring tech companies and social media to play a part in the data collection and dissemination of the content.
```
For instance, they might be used to target us individually based on our biomarkers. Sorry, I forgot to mention AIs will soon have more information about us on a biological level, including our health and behavior. Why? you might ask. Because you’ll give it to them through apps and programs you’ll engage with, or are already engaging with.
```
What planet does this guy live on? Certainly not Earth. In all seriousness, I think I agree with the odd underlying point he’s trying to make. The privacy implications here are indeed very concerning, and we do risk being manipulated by these systems. So, I’m on board with that. However, shoehorning that into election manipulation, especially by 2026, is patently ridiculous. Who is the big bad person pulling those strings with all of the access to that data? In 2026, we’ll still have disparate systems and individual collections of data. Even if this were possible, it would still require a bunch of collusion between providers. This is a conspiracy theory dressed up like a technology prediction.

Just Not That Stupid

It’s difficult to grasp that other people aren’t that stupid. I think this is one factor that keeps the fear of deepfakes stoked. Ultimately, people believe what they want to believe, real or not. It’s been this way since the dawn of civilization.

As I mentioned in a previous article, we don’t see ourselves in other people. We see ourselves as outliers instead of the mean. This often gets warped by algorithms that keep us in a bubble and promote the most outlandish content. As someone who lives in Florida, this hits close to home with all of the nonsense about people thinking Hurricane Milton was not only manufactured but controlled. This resulted in meteorologists getting death threats.

If you tried to write a book on stupidity, this is so stupid that nobody would believe it, but then again, many conspiracy theories turn out this way. Conspiracy theory has morphed into a cult or a religion.

Back in 2021 I wrote, “Conspiracy theorists are like cult members, only worse. Worse, because a cult has a leader, but conspiracy theories make you the leader.” This everyone is a leader concept is incredibly empowering and addictive, making people both the hero and the victim. Despite this, most people don’t have deep conspiracy beliefs, yet they can receive outsized attention in both social media and traditional media alike. Don’t fall for it.

How Are They Being Used

So, how are deepfakes being used in the 2024 US Election? They aren’t, well, to be more precise, not with any relevance. Primarily, you see memes and slop, precisely as I predicted in previous articles. There was a boatload of bad reporting on the topic, but it doesn’t match reality. An obvious example was the Taylor Swift AI endorsement of Trump. These were reported as though they were deepfakes, but the whole incident was silly. If you went and reviewed these images, one is an AI-generated image of Taylor Swift dressed as Uncle Sam. The rest are Swifties for Trump. Comically enough, one of them is literally labeled as “Satire.” Where is the facepalm emoji when you need it?

Campaigns and political action committees (PACs) have used AI to generate content used as counterfactuals for campaign ads. This is something I think is disgusting, but they are hardly deepfakes. They are slop with a propaganda twist.

There was the fake Joe Biden voice call in New Hampshire, which was a legitimate deepfake. However, nobody believed it, and it had no impact. This isn’t unlike every other deepfake this election cycle.

Interestingly, Trump claimed that Kamala Harris’ crowd in Detroit was AI-generated. I also predicted this type of accusation behavior back in 2020 in my initial deepfakes article. Okay, so maybe now I’m taking a small victory lap.

Conclusion

It’s time to take a deep breath. We are less than two weeks away from Election Day in the United States. Undoubtedly, the AI election is here, and it’s more silly (and sometimes pathetic) than terrifying. Despite this, it will not be the last we hear of the fear-mongering. Fear sells, and fear gets clicks. Now, get out and vote.
Be Fearless Podcast Appearance

October 18th, 2024

In August, I sat down on the show floor of DEF CON to discuss a variety of topics with Vivek Ramachandran, the founder of SquareX. Our conversation covered a variety of topics, including AI at the intersection of humanity and technology.

You can listen to the podcast using your favorite platform, or feel free to watch the video below. It was a great conversation that certainly brought back some memories from the early days.

Also, to clarify one of my old school references, we were discussing the old days of wireless hacking, and I brought up a reference to the Proxim Orinoco Gold cards. If you’ve never seen the cards they feature a man with a briefcase far too happy to get WiFi.

I also forgot that they used the larger image of him on the box.