Tag: Generative AI

Conversation with CIO.Inc on AI Safety and Security

August 26th, 2024

I had a great conversation with Aseem Jakhar for CIO.inc and iSMG. We covered topics surrounding AI Safety and Security as well as deepfakes. I explained why I don’t think the misinformation aspect of deepfakes will affect the outcome of elections and provided my opinion on deepfake detectors. We also discuss how we think we need to throw out the rulebook every time a new technology comes along instead of applying lessons learned.

Watch the interview here: https://www.cio.inc/security-experts-prioritize-ai-safety-amid-evolving-risks-a-26115
Illusion of Influence: The AI-Generated Misinformation Apocalypse That Wasn’t

July 18th, 2024

As predicted last year, there is an increased trumpeting of risks focused on deepfakes and AI-generated misinformation surrounding the 2024 US election. Living up to the mantra of never letting a tragedy or potential tragedy go to waste, media organizations have latched on to the doomsday nature of AI-generated misinformation and its potential effects on global politics. I’ve maintained the risks from deepfakes and AI-generated misinformation on elections are overblown, and I’ve seen nothing so far that makes me believe the contrary. In cases of highly polarized topics, the application of AI to misinformation does nothing to change people’s minds.

Over the past couple of years, this has been a relatively lonely position as it seems (or at least seems to me) that I’ve been on an island about to draw a face on a volleyball. Even experts I agree with on other aspects of AI seem to be sounding the AI-generated misinformation alarm the loudest. But it seems the position isn’t so lonely anymore, as more people add their voices to the discourse. However, I’m still struck as to why so many people believe otherwise, but I think I’ve gotten closer to the core of why this remains a powerful illusion.

Why The Powerful Illusion Remains

There is certainly no shortage of perverse incentives around inflating the risks of AI-generated misinformation. For AI companies, this threat demonstrates the power of their technology; for some, it’s the perfect topic to illustrate the need for increased regulation, but for others, it is a potential business opportunity. Many of these situations can be identified easily, but that still leaves many people with this belief. I think I’ve gotten more to the core of why this is such a powerful illusion, and it comes down to something simple: we aren’t seeing ourselves in other people.

People can’t fathom how having a highly convincing image of something won’t fool people. The problem here is we don’t see ourselves in other people. I mean, are we dolts that believe absolutely everything we see? Of course not. But we assume that everyone else is a bunch of idiots who believe everything they see instantly. We see ourselves as outliers instead of the mean. This perspective also ignores the fact that the presence of fake information has been around us the whole time.

We see ourselves as outliers instead of the mean.

When people point to outlandish beliefs like QAnon as proof that this content would fool people, they are also fooling themselves. A fair number of QAnon adherents don’t believe in anything they share. They share it to troll the other side because it irritates people.

Government Perspective

You may be thinking, didn’t the US Justice Department make a big deal about disrupting a Russian AI-powered propaganda campaign? That alone must disprove your argument. Well…

The statement from the Attorney General is pretty strong, but the details matter. You can read the full press release here.

So, let’s read through. Hmm… okay, okay, okay, Spits out coffee WTF? You’re trying to convince people of the impact of these operations, and that’s the best you can come up with? Some sock puppet account with 23 followers?

Okay, well, maybe this account with this few followers shared a viral video that had a major impact. So, how many thousands of views did his video get?

Five? Wait, do you mean five… hundred thousand? No, five. I hate to point this out, but you have to assume at least one or more of those views came from the person monitoring the account. Maybe I’m being too harsh, and this was the setup for the big reveal, so show me the next one.

Seven. Huh. Well, that won’t hit any type of vitality.

Sorry for the low-quality images. Apparently, the government doesn’t train people to take proper screenshots. It conjures images of some analyst pecking the keyboard with their index fingers. However, these examples fail to prove any point supporting the existential threat of AI-generated misinformation.

There is nothing here a human couldn’t do, so if GenAI didn’t exist, Russia would just put asses in seats. That’s what the Internet Research Agency (IRA) did in the past.

AI-Generated Misinformation’s Rough Time

AI-generated misinformation has actually had a rough time for a while. Last year, it was reported that Russia’s Doppelgänger group was struggling to find an audience. Ouch.

We are already in the era of generative AI and deepfakes, and we’ve had multiple high-visibility elections throughout the world. Still, the misinformation aspects of generative AI haven’t affected these elections. As a matter of fact, it unfolded like I said it would with generative AI used for memes, not misinformation. You’d think, with some evidence now, this narrative would let up, but quite the opposite. Many are doubling down.

This disconnect is most obvious with government types. Recently, the director of CISA warned of the risk of US adversaries causing “unimaginable harm to populations across the globe.” This was in reference to adversaries affecting elections. Really? Unimaginable harm, despite the fact that we have evidence to the contrary?

I believe, to some extent, this comes back to incentives. The misinformation topic seems like the perfect example to push for more regulation, and many refuse to take their foot off that gas.

The Reality

Influencing people through AI-generated misinformation is a much harder problem than people want to acknowledge. In a post I wrote last year called Generative AI, Deepfakes, and Elections: Apocalypse or Dud, I introduced something called the Generative Misinformation Cycle to demonstrate the phases and challenges.

With misinformation, your goal is to influence an outcome. This would be to change people’s minds or get them to take action. All of the other phases, such as generating misinformation and amplifying it on social media, only work for a shot at influencing the outcome. Yet it’s these relatively inconsequential activities that so many people focus on. This is where the confusion sets in. To people’s credit, these are the tangible things people can see and measure. It’s much harder to measure a changed mind. However, pointing to these relatively easy activities and saying that their presence indicates that an extremely difficult thing (influencing the outcome) will happen doesn’t match reality.

What AI brings to the table is assistance with the generation of content and some automation activities. That’s it. So sure, you can create a lot more of it and try to have your bots amplify it, but if a misinformation tree falls in the social media woods and only bots hear it, does it really make a sound?

If a misinformation tree falls in the social media woods and only bots hear it, does it really make a sound?

Sowing Confusion

This is about the time when people will mention using generative AI to sow confusion, but sowing confusion is a far cry from sowing influence. It’s not like when people mentally check out due to confusion that their brains somehow revert to an initialized state where they don’t have an opinion.

It’s not like when people mentally check out due to confusion that their brains somehow revert to an initialized state where they don’t have an opinion.

Even when the technique does work, it’s only effective for unfolding current events and on topics where people aren’t emotionally invested, such as an unfolding global pandemic or geopolitical situation far away from home. Sure, this can cause some negative impacts, but if GenAI wasn’t around, bad actors would do this with humans. And, of course, once people’s strongly held beliefs get involved, the trenches are dug too deep to dislodge them.

Poor Research

Poor research also plays a role here. There is so much junk research on AI-generated misinformation. This research often focuses on the wrong questions combined with model capabilities. In the end, you end up answering positively, the wrong question. Take the example below that I shared in February.

Apparently, in his “extensive research,” he missed the fact that concentration camps aren’t typically associated with US political parties. He basically confirmed that LLMs can say mean things. At this point, it should be well known that LLMs can be made to say mean things, and that is a reality we already have today, not in some future state. But this paper does nothing to answer the real question: does the fact that LLMs can say mean things have an impact on human political polarization? After reading this far, you should already know my perspective.

You should also recognize another common theme that this research misses and it’s that instances and capabilities don’t equal impact. I’ve covered this in my previous posts.

Instances don’t equal impact.

Since the Durably Reducing Conspiracy Beliefs Through Dialogs With AI paper is making the rounds again, I’ll point out that I wrote a whole article addressing the issues with that paper.

The Real Risks

There is no shortage of real risks surrounding generative AI. I’ve talked about this at length. I’m far more concerned with the Internet turning into a junkyard or how tech companies are shoving generative AI into every technological crevice imaginable than I am about a theoretical misinformation apocalypse. These activities have far more impact than any usage of generative AI to try and manipulate an election. However, there is also a risk of overly oppressive regulation.

Pretty much everything on the internet is manipulated. You could also say that it is technically misinformation. For example, applying a photo filter adds information to a photo that wasn’t there, and cropping a photo removes information. You could say the same about grammar checkers rephrasing sentences, document summarization, and a whole host of other tools people use on a daily basis. It gets blurry if you are only focused on the information manipulation aspects.

I’ve said all along that regulating underlying technology is a losing proposition. What should be regulated are use cases. This is because AI is a dual-use technology; therefore, the harm surfaces in the use case, not in the technology. It’s a tough problem, and I don’t envy the people trying to address it.

On the other hand, inadvertent misinformation and junk content cluttering the Internet is a real problem. For example, which of the two photos below is the real photo of the Matterhorn?

Surprise, neither of them. Now, if we take into account everyone’s AI-generated blog posts, news articles that don’t get checked, and a whole host of other content that doesn’t rise to the level of existential threat, we have a world cluttered with garbage.

Garbage like Popes in puffer jackets, fake dresses, AI-written nonsense books, and much more. It’s like taking a stroll through a junkyard instead of a pristine forest. Okay, bad analogy. The Internet has always been a sort of junkyard, but now, instead of strolling through rows of junked cars stacked on top of one another, there are junked cars mixed with heaps of household trash strung about, littering the walkway and stinking up the place. We haven’t reckoned with this yet.

Conclusion

This post has remained a semi-written draft since November 2023 because I always feel like I’ve said what I needed to say on the topic. However, I keep getting pulled back in. As a bonus, there are recent updates and more evidence, so my procrastination seems to have paid off.

Unfortunately, the claims of a coming misinformation apocalypse will be with us long after there is more proof to the contrary. Proponents think they’ve found their ultimate talking point to push regulation. Ultimately, this will be the story of the AI-generated misinformation apocalypse that wasn’t.
The Misinformed Landscape of AI Usage

July 1st, 2024
The tidal wave of information on AI use smashes the shoreline daily, nearly all of it universally positive. News stories, analyst reports, and anecdotes all lead you to believe that you are already in the dust, no matter how advanced you are. Your competitors are smoking you, and everyone is using AI for everything successfully except YOU. This is the massive headwind many of us pushing back find ourselves in, constantly bombarded with news stories and analyst reports, all in service of telling us we are mistaken. A congregation was sent to consult the Oracle of Gartner and your perspectives have been found wanting.

In the space we refer to as reality, what we think we know about AI usage is wrong. So, how did we get here? How have we become so misinformed? The answer is pretty simple: humans. Okay, well, more specifically, surveys and interviews.

Surveys and Interviews

It’s long been known that survey data is only slightly more valuable than garbage, but when it comes to AI, survey data can be a fully engulfed dumpster fire. There are several reasons for this, but the primary reason this is so bad in the AI space is that nobody wants to look stupid or appear behind the curve. So when the analyst, survey taker, or journalist calls, people start parroting.

The primary reason this is so bad in the AI space is that nobody wants to look stupid or appear behind the curve.

Instead of responding with observations they’ve made or activities they are actually doing, they respond with something they’ve heard, articles they’ve read, experiments they hope work, and a host of other things that aren’t true activities. This equates to people expressing their vibes. This disconnection leads to an opening chasm with reality. Since surveys and interviews are the primary methods to collect this type of usage data, that doesn’t bode well for determining realities on the ground. With the hype turned up to 11, a red flag would be when your survey results confirm a 10.

I’ve pointed out this parroting vs. observation issue in my presentations at various conferences for the past couple of years. Although this parroting makes for some wildly comical analyst reports and news stories, it’s rough if you’re trying to make decisions based on them, or worse, when your boss expects you to produce a magic wand and summon the guardians of innovation because you are being left in the dust.

A few days ago, I read an article from the Ludic blog making the rounds that contained the following image.

This is an obvious red flag, and the author points this out in much more eloquent and spicy language. We’ve long known that most AI/ML/DL projects don’t make it into production, but all of a sudden, LLMs come along, and 92% of companies are finding great success. It’s not real. Speaking of 92%…

GitHub reported last year that 92% of US-based developers are already using AI coding tools. The gut reaction is this feels wrong, but hey, it must be true if the data confirms it, right? So, let’s do a thought experiment. Imagine standing in the frozen dessert section of the grocery store, asking people if they like ice cream. Now imagine asking everyone buying ice cream if they like it. What if you only asked two people, or five people, or ten people?

When it comes to usage data, what does “using” mean? What is the definition put forth in the survey? What is the makeup of the population? Most importantly, what do they define as “AI”? All of this matters, and it doesn’t take much imagination to realize how incredibly biased survey data can be. The flames are further fanned by the illusion that models have more capabilities than they do and companies faking demos.

For a deeper response to some of the common points people make, read the article I mentioned. I have some quibbles with some of the article’s content, but all in all, it’s a solid read, and the spicy language makes it all the better.

In a previous post on GPT-4 Lowering Conspiracy Beliefs, I addressed some of these issues surrounding surveys and survey data. I called attention to dark data categories that often surface when surveys are used. I also recommended David Hand’s excellent book Dark Data: Why What You Don’t Know Matters. The book will change the way you view surveys.

The unfortunate reality is that quite a few people have a vested interest in perpetuating these misconceptions. You’d think this would be the companies building these products since it increases their revenue, and this is certainly happening, but most of them aren’t affiliated with these companies. They want to be seen as the ones with the knowledge. They are influencers trying to drive people to their funnels and people in the tech industry who don’t want to look clueless. It’s hard for people to call you out on something when you are saying the same thing everyone else is saying.

Another red flag was shortly after ChatGPT was released. We were inundated with articles quoting opinions by leaders and executives who had never used the technology and had no idea how it worked or even what it was capable of. But it seemed as though we couldn’t get enough.

Dumpster fire achieved.

Ask Questions

We aren’t helpless in these cases. One of the best defenses is asking follow-up questions and probing beneath the surface. I know, I know. We pay (INSERT ORG HERE) a lot of money, and they say… But bear with me a moment.

One recent technique I’ve used is marking up reports, slides, and other information sent to me to help people focus on obvious issues and force some deeper thought. This gives others an idea of where I’m coming from and helps plant the seeds of these questions in people’s heads. Typically, these reports create more questions than they answer, and responding with, “This is dumb,” is not the best tactic. Here’s a recent example I used for a report discussing GenAI’s security use in 2024.

Along with this markup, I also included data in the email questioning the statistical makeup of the data used in the analysis. Funny enough, for this particular section, there was no information about the sample size, industry verticals, or other important information about the makeup of the sample. This is always a red flag. Maybe it was mentioned somewhere else, and I missed it, but it wasn’t available in this section like in the others.

Often, even asking a simple question, “How” can be super effective.
```
“Generative AI is completely transforming X business or process.”
“Oh yeah? How?”
```
The questions of how, what, and where can be your ultimate weapons in defense against some of this contradictory data. They inform you if there is something real and help you understand if the use cases proposed to support the strongly worded statements made. There may be good answers to these questions that you may want to consider. There are legitimate use cases, and you do want to stay ahead of the curve, so being better informed helps you take advantage of opportunities.

Misunderstanding the data has negative impacts, putting further strain on your resources to create competing solutions or wasting time trying to recreate something that isn’t even working in the first place. Even if another organization successfully uses generative AI for a task or process, you might be unable to replicate it due to different applications, systems, data, and processes.

Even if another organization successfully uses generative AI for a task or process, you might be unable to replicate it due to different applications, systems, data, and processes.

I’m not bashing analysts or survey takers. Conducting surveys without influencing the outcome is hard. That’s why you can find surveys that confirm just about anything. I’m sure the people writing these reports believe what they write, and it matches the data they have.

Conclusion

The grouping of technologies under the umbrella of AI is certainly useful, yes, even LLMs. Non-generative approaches and more traditional ML and DL have been deployed to solve challenging problems for decades. These approaches are already baked into the systems we use. However, the hype and hysteria throw off any real perception, and you often find that complete transformation aligns more with hopes than realities. Ask the right questions and probe deeper to ensure you are making decisions on the right insights. Find use cases of your own and perform your own experiments. You’ll quickly see what’s working and what’s not.
Die Robot, Die! Backlash Against AI-Powered Tech Is Deeper Than People Think

February 16th, 2024

Something that may have gone unnoticed in recent months is that there has been a brewing backlash against physical AI-powered devices. The most recent example of this was the crowd attacking the Waymo vehicle in San Francisco, but this is far from the only example.

I started noticing this trend with AI-powered food delivery vehicles. This was a bit odd since they seemed to be pretty good at failing themselves, often found tipped over, going the wrong direction, and even, in one case, committing a hit-and-run.

Although many would write this cheering crowd off, envisioning Enoch in the hands of the destroyers, this would be a mistake. Those of us working in tech can roll our eyes and dismiss outlandish reporting and other nonsense but think about the countless people who have a steady stream of this clogging their newsfeeds. There is an underlying mood to these activities that cuts much deeper than tech hate or the realities of technology innovations from the past. In this post, we explore what’s bubbling beneath the surface.

Quick Update

I haven’t had much time to write content lately. This isn’t for lack of material since there’s been a firehose of things to write about. I’ve been working on something much, much longer than blog posts, so that’s consumed quite a bit of my time. Also, you can check out my post on something I call the AI Solutions Risk Gap on the Modern CISO blog. Here, I break down what really matters with AI Risk and give leaders some topics for consideration.

Why The Backlash?

I believe what we see here is the beginning of something bubbling to the surface. This is the inevitable outcome when hype meets uncertainty. AI hype is putting all of humanity on notice, and humanity notices.

AI hype is putting all of humanity on notice, and humanity notices.

So, you may wonder why attack self-driving cars or delivery robots. It’s because these are the physical manifestations of AI in the real world. After all, it’s kind of hard to punch ChatGPT in the face. These devices represent symbols of a future that doesn’t need humans at all, erasing humanity from the equation. It’s a mistake to attribute this to Luddism or tech hate.

First of all, anyone invoking the Luddites should read Brian Merchant’s book Blood in the Machine. Second, the Luddites’ concerns about tech affected their industry and the social and political environment surrounding it. It’s an entirely different scenario when discussing a large swath of humanity in multiple industry verticals and positions. Unfortunately, this uncertainty is primarily driven by exaggerated news reports and speculation.

Examples like this gem from Yahoo Finance that shows few cuts attributed to AI but speculates that people are lying despite direct quotes to the contrary. There are also framing issues and more to critique in the article, but most people won’t notice any of this. All they’ll see is, once again, AI is coming for your job.

People have a sense that the technology isn’t as good as it claims to be and yet continually see reporting to the contrary. They also see the launch of what I refer to as shitty AI gadgets, like the Humane Pin and the Rabbit, two devices that apparently investors and the media love, but the rest of the world, not so much.

AI is now being shoved down our throats in absolutely everything, whether we want it or not. Even Mozilla is scaling back to focus on adding AI to beloved Firefox, despite the fact that absolutely no Firefox user actually wants it. Microsoft is cramming it into every corner of the Windows operating system. Deep AI integration is bad for both security and privacy and despite this being known, the push continues. Nothing is sacred anymore.

To pile on, people are being told their jobs are in danger, which they are, but not from super capable AI, but from overzealous business leaders who hope that the tech catches up faster than they’ll have to backfill positions or rehire the people they let go. This is despite underwhelming performance when they do demo products or launch experiments.

There is no doubt that, in general, AI technologies will continue to make progress, solve problems, and become more capable. We will even get to AGI. But in the short term, we are being sold a bill of goods before these companies even get the technology working, much less working effectively. It’s like thinking you are buying a Ferrari, but when you take delivery, it’s a wooden go-cart with wet paint and the word Ferrari on the back.

It’s like thinking you are buying a Ferrari, but when you take delivery, it’s a wooden go-cart with wet paint and the word Ferrari on the back.

You even have people like Sam Altman telling people ChatGPT will evolve in uncomfortable ways, wanting to push this technology further into your personal life with far more access to your data. No wonder people protested outside the OpenAI’s office. Give us more of your data so we can replace—I mean, help you. The reality may not be so cut and dry, but that’s what’s in people’s heads, and they don’t like it.

Tech companies hope to employ their standard brute force playbook and steamroll through the problems, but I think it’s far more challenging this time. The AI field, in general, will bring us a lot of advancements. LLMs are undoubtedly useful for some tasks but remain overhyped and won’t get us to AGI. LLMs are the Diet Coke of AGI. Just one calorie is not nearly enough.

Human Manipulation May Win The Day

If all this wasn’t depressing enough, if there’s one thing we know for certain, it’s that humans are easily manipulated. We can reliably reproduce these results. Companies will start to employ more manipulation techniques to avoid larger issues and ease adoption.

These can be subtle and often go unnoticed. For example, have you noticed how the responses are displayed while using ChatGPT? They are completed across the screen as though someone else is typing back to you. This makes it feel more human.

I remember reading an article years ago about home assistant robots that were in development and how people didn’t like them. Then, the developers projected simple facial expressions on the robot’s face, and people warmed up to them. They were the same product that now had a simple face, with no further capabilities added.

To take this further, look at the image I used as the featured image for this post. It may make you feel sorry for the robot despite being imaginary and completely manufactured. The robot never existed, the human never existed, and the scenario didn’t exist. Yet, we still can’t help feeling sorry for the poor robot despite the fact it may have been a homicidal, mass-murdering robot whose sole purpose was to kill as many people as possible.

So, if we apply subtle manipulation to the current situation, imagine the delivery robot having a statement printed on it that says, “If you see me in trouble, please help me.” This is a statement from a piece of technology asking you, the human, for help. Since most people help when asked, people may be likely to stand up a tipped-over device and less likely to kick or destroy a device requesting help.

Or a wilder scenario, projecting a frowny face on the windows when a car is attacked and a voice that says, “Stop, you’re hurting me.” These techniques may reduce the number of incidents by manipulating the humans coming in contact with the technology through techno-social engineering.

Our world is already filled with priming, subtle manipulations, and nudges

Our world is already filled with priming, subtle manipulations, and nudges. Companies building this technology won’t find ways to make the situation more equitable for humans; honestly, that’s not their job. However, they will find ways to manipulate us into believing it’s in our best interest, ease adoption, and minimize backlash. Anthropomorphism and other human manipulation techniques will be employed to serve the company’s goals. On the other hand, this is something we should all be concerned about.

One example of manipulation is this article. Notice the mental tricks Sam Altman employs. By claiming AI is dangerous in this way, he’s creating a humble brag about its incredible capability. Claiming to want regulation makes him appear reasonable and concerned. He gets to play the hero and the victim at the same time. It’s a lot less genuine when you realize this is a push toward regulatory capture. I’m sure there’s an Onion article in here, somewhere like, Man Creating AI Says It’s Dangerous And Wishes There Was A Way to Stop Himself.

Business Leaders

Business leaders play a critical role here and need to be more critical of claimed advances in the AI space. When putting pressure on internal developers, they need to understand that the biggest companies in the world are struggling with operationalizing generative AI. So it’s reasonable to assume you’ll have challenges as well.

Business leaders also need to be far more critical of vendor AI claims. Keep in mind that demos are staged and offer known variables for vendors to present during sales meetings. These situations don’t match your organization and the unique data and challenges you’ll encounter. When evaluating a demo, ensure that it’s evaluated on your data with problems that you encounter. Also, ask the vendor about challenges you’ll have, as well as things that their tooling doesn’t do well. If you don’t get good answers, run as fast as you can in the opposite direction.

Common things I hear are, “Why would make up stuff about their products?” This is typically when I spit my drink out. Dig in and verify claims. Just because a product may work in one environment doesn’t mean it will work in yours.

Conclusion

Although we all love to rage against the machine, the problem is that we are all a part of it. In the near future, we’ll start to see more applications of techno-social engineering. We also need to be far more critical of the news stories we consume. There’s a deluge of junk research and sensational news stories out there. Staying level-headed and asking the right questions can help keep you grounded on where the realities are.

Update

I wrote this article before seeing Brian Merchant’s article. You can read that here: https://www.bloodinthemachine.com/p/torching-the-google-car-why-the-growing He digs a bit deeper into the self-driving vehicle aspect, so we have a similar theme but a different focus. It’s well worth the read. Also, I learned from that article that people were destroying e-scooters as well.
The Absurdity of the 1000x Productivity Increase

January 11th, 2024

I wanted to take a moment to address an obvious issue. In my last post, I discussed making sense of AI predictions and gave a framework to help, but how do you make sense of the absurd? After writing that post, I’ve seen imaginings of not a 10x productivity boost from AI but 100x and even 1000x. Not with some futuristic, cutting-edge technology yet to be developed, but with what we have today. Now is a good time to remind these people that the “x” after the number means times, although it seems to have morphed into a generic representation of “better.”

There is truth to their statements when they claim humanity isn’t prepared for 100x or 1000x productivity increases. This is true. We also aren’t prepared for a stampede of unicorns frying us with their laser beam eyes. A 100x or 1000x productivity increase is such a large number that it makes us mentally check out, insert the word “better,” and open the door for pointless pontification. Not to mention, this level of performance increase would be pointless in almost every context.

The Reality

Let’s think about this realistically, with some numbers. A widget factory produces a 1000 widgets per day. It then uses some magical AI dust to gain a 100x output boost, that’s 100,000 widgets a day. Now, at 1000x, that’s 1,000,000 widgets per day. In one year at full capacity, they’ll have 365,000,000 widgets vs. the previous output of 365,000. I hope they have a whole lot of storage.

Now, think of the developer who previously wrote 500 lines of code per day and now outputs half a million lines of code daily. Or, take the case of a prolific author who writes a book per year. Now, they are writing a thousand books per year. What should be clear by now is not only the absurdity of all this but also the larger problem: where are your 1000x new customers?

Output on this scale is meaningless without a massive increase in consumption. A 1000x productivity boost is pointless and possibly burdensome without a 1000x increase in consumption, aka customers and demand. Even with a scale-back in production, it’s still burdensome at this rate. I mean, unless you only plan on firing up the factory for a single day a year.

These things remind me of the Philip K. Dick short story Autofac, where automated factories keep replicating themselves and producing goods that nobody wants. Maybe the milk really is pizzled.

Y, Tho?

So, why are people claiming that 100x or 1000x productivity boosts could be on the horizon? Well, I can only assume because those numbers are bigger than 10x 🤷‍♂️ Even though 10x is already a massive productivity increase, bigger claims, bigger hype. People aren’t going to read your blog posts anymore if you are still 10x’ing. In the hype game, it’s go big or go home.

There is something about spelling out claims like this that makes the obvious flaws shine through. Putting numbers to this demonstrates the absurdity of the claims and should highlight a significant flaw in the logic. Of course, maybe there’s another likely scenario. People using ChatGPT to write their blog posts aren’t confronted with reality since they aren’t actually reasoning through the content they are churning out. They are too busy 10x’ing to realize flaws in their statements. Oh well, that’s a blog post for a different day.

People using ChatGPT to write their blog posts aren’t confronted with reality since they aren’t actually reasoning through the content they are churning out.

Real World Performance Boosts are Happening

There are certainly situations in which organizations can and even are gaining productivity increases using todays AI technology. These may even be doubling, tripling, or, in some rare cases, approaching 10x productivity increases, but these are highly specific, situational, and typically related to tasks and not the entire chain. Many of these areas are creative in nature: copywriting, stock photos, VoiceOver, and even video game design. All of these areas have seen massive productivity boosts from generative AI. Of course, we won’t get into a conversation on quality, but good enough is fine for many of these tasks.

Conclusion

There is no doubt that AI has the potential to transform our world in a wide variety of ways. This is both with the technology we have today as well as the technology we’ve yet to invent. There will be plenty of surprises, advancements, and things we didn’t see coming. However, we have to stop giving oxygen to people making outlandish and absurd claims. Remember, they aren’t making these claims for your benefit.
The Realities of GPTs and the GPT Store

November 15th, 2023
OpenAI’s recent announcement was made during their DevDay, and it was hard to avoid. At this point, I don’t think OpenAI needs a marketing department. One of these announcements was of GPTs and the GPT Store. On queue, the amateur futurists swarmed social media with bold claims and predictions, stating that this was an App Store moment just like we had for the iPhone. So, is this an App Store moment? Are the stars aligning? Are we entering a new era? Let’s take a look.

Quick Note

So, before we dig into this, I like the concept of GPTs and even the GPT Store, which may not be apparent from the content in this post. That’s because this is a post about innovation and impact. The point isn’t whether paying customers of ChatGPT will use GPTs; it’s whether GPTs will create new paying customers of ChatGPT as well as create an inevitable market that companies will need to consider as part of their strategy. This is what it would take to make an “App Store Moment” and is the primary perspective of this post. However, I will highlight a few additional issues as we go along.

My Initial Take

This post expands on my initial comment (or hot take) here where I made some claims and predictions of my own.
So, to summarize from my previous comment:
- They are creating additional attack surface
- They are inheriting the issues of an AppStore
- Influencers, not innovators, will drive use cases
- Most use cases will be inconsequential
- Malicious use cases will propagate
- Most interesting use cases will continue to be deployed outside the GPT Store
What Are GPTs?

GPTs are a custom version of ChatGPT that you can create for a specific purpose. Some examples they give are learning board game rules or teaching your kids math. You can create these with natural language without having to do any coding. The GPT Store will allow people to share and sell these GPTs to others.

In a nutshell, it’s a fancier way of selling prompts to others with additional features, such as adding data and connecting to the Internet.

GPT Store Use and Trajectory

Influencers will drive use cases, not innovators.

The GPT Store hasn’t launched yet, but it’s clear that influencers and AI hustle bros will drive the use cases, not innovators. Influencers will rush to fill the platform with chatbots where people can ask them questions based on previous content they’ve published. Being influencers, there’s absolutely no way they’d ever try to oversell the impact of these. (Feel the virtual eye roll.) There’ll also be a healthy dose of memes because you have to keep the world spicy 🌶️

There will also be a swarm of use cases where the only goal is to be first and a majority of use cases will be largely redundant or uninteresting (in the context of innovation), providing GPTs that basically do what anyone can do with ChatGPT themselves, only repackaged and marketed as something more capable. Newsreaders, page summarizers, document summarizers, and many similar GPTs will crop up. Mostly, these will be thought of as “throw-away” use cases.
```
Note: I’m not saying that these use cases are useless. Some may find them helpful, but once again, we are discussing these in the context of innovation and creating a culture of paying customers.
```
It’s likely we will see a host of celebrity and historical figure chatbots because they are easy to create. Maybe some celebrities will release branded chatbots themselves, primarily ones that don’t recognize the reputational risk. However, still, I wonder how many “Saylor Twift” type chatbots will crop up. These bots are allowed. You only need to mark them as “Simulated” or “Parody” according to OpenAI’s policies. That’s if their creators even bother.

Even with historical figures, there’s a huge problem with distilling them down into a subsection of their writing or public appearances and pretending that they’re somehow interacting with them or getting to the heart of what they actually thought about something, but this is a philosophical topic for another blog post.

We’ll see a familiar trajectory where you have a usage spike followed by a drop-off after people have checked it out.

99 Problems and an App Store is One

By providing the GPT Store, OpenAI inherits all of the issues associated with running an App Store. These issues should include providing proactive protection to protect users from malicious GPTs. In addition, another layer should be part of this in protecting the content of creators primarily from others using their work in an unauthorized way. This protection needs to be advanced and proactive to provide even a basic level of protection. Given the initial launch and announcement, there doesn’t appear to be anything like this.

OpenAI has its acceptable use policy and will most likely count on the community for reporting. In addition, they may do some basic scanning, using a prompt to an LLM in much the same way as they did for plugins, but this is not even scratching the surface and is only a minuscule touch better than doing nothing. This won’t be maintainable if the GPT Store grows at all, and with the ease of building and deploying GPTs, this will spin out of control quickly.

Content Theft

People will undoubtedly create GPTs with other people’s content and work. This will drive less traffic to the original creator’s funnels. This is stealing other people’s work in a more direct way was done for art.

Disturbingly, some see no problem with taking a book like Outlive and creating a chatbot out of it. Even more, find no issue with taking Dr. Attia’s public content and making a chatbot out of that. There seems to be this impression that it’s fair game since he put the content online. There is something rotten to the core with this mindset, especially in cases where you are monetizing someone else’s work.

To make matters worse, GPTs and the GPT Store make it much easier to build and deploy systems that use other’s content with less friction than a more standalone solution, which is why you’ll see more content theft with GPTs vs other methods.

GPTs and the GPT Store make it much easier to build and deploy systems that use other’s content

Don’t hold your breath for a solution here. OpenAI has a mindset that they are providing the tools, and if people misuse them, that’s on them, but there is a huge gaping hole in this logic regarding content. How would anyone go about this themselves? It’s difficult to identify in all but the most egregious cases, so yes, calling your GPT the Dr. Attia Bot or the Outlive Bot would certainly raise some eyebrows, but the real harm is behind the scenes. The Live Longer Bot, completely made up of Dr. Attia’s work, would be difficult or near impossible to detect from the average content owner’s perspective.

The responsibility for detecting this type of misuse can’t be thrust onto content owners. Creators can’t police the GPT store for all of the instances of usage of their content. Only OpenAI could do something like this and accomplish it in a way with breadth to have a chance of success. The fact that OpenAI isn’t even considering a real solution to this problem should tell you all you need to know.

There is a caveat here, and that is, this is a hard problem, so I don’t mean to make it sound easy. It’s not like all you have to do is make a list and check against it as people deploy GPTs. There needs to be a thoughtful approach that considers the capabilities and tradeoffs and gives people concerned about their content some methods to check and recourses to take. But doing nothing isn’t an option either.

After all, it’s OpenAI deciding to launch a platform that allows for easy theft, deployment, and monetization of other people’s content. It should also be their responsibility to ensure they are at least taking some real steps to protect content owners and give them a process for checking if this is the case in a meaningful and effective way.

Time will tell, but there doesn’t seem to be an indication that this will happen, and it may only happen after a series of lawsuits.

How creators may change their behavior based on content theft is an interesting thought experiment. How are you supposed to promote your work if, through promotion, your work is stolen and used? It’s a conundrum, and we shouldn’t learn the wrong lessons.

Malicious GPTs

There will undoubtedly be malicious use cases. These will try and steal information and data from the user. They may even try to trick the user into installing malware. To stop this, there would need to be more robust checks in place and a process to catch these malicious GPTs before they are deployed to the GPT Store.

The popularity of this as a vector for attackers will be the popularity of the GPT Store. So, malicious GPTs will scale with this popularity and draw more attention from attackers as the attention grows.

Surprises

I do agree with OpenAI’s comment that interesting (not necessarily the most interesting) use cases will come from the community. It’s possible that creating this GPT Store opens an avenue for someone to create a meaningful app that wouldn’t have been possible otherwise. There will undoubtedly be some of these use cases, and they will be pretty cool. We should expect some surprises like this. The ultimate question, though, is, will there be enough of these use cases where it’s interesting enough for people to continue paying not only for ChatGPT Plus but also any additional fees for the GPT? It’s possible, but I wouldn’t bet on it.

Most Interesting Use Cases Remain Outside The GPT Store

The most interesting use cases of the technology will remain outside of the GPT Store and its ecosystem. This is for some reasons that are fairly obvious upon reflection. This mostly comes down to access and control. Organizations want to exercise greater control over their intellectual property and data. Conversely, open-source models are highly effective, and an organization could easily construct a more self-contained solution where none of the data has to leave its control.

It’s not just control. It’s also about the technical feasibility involved with GPTs architecture. If you have a fancy prompt, need a bit of data from the Internet, or want to chat over a document, then GPTs are fine. If you are trying to integrate LLMs into an actual solution, then the capabilities aren’t there.

Also, storage cost is steep for GPTs. By some estimates, as much as 260x more than something like S3.

Companies would also need to actively look at the GPT Store as a valid delivery source for their customers. This would only happen if this were a large, untapped market. So, only if the GPT Store is a smashing success will this force companies to consider creating GPTs on the GPT Store.

And Security… Always The Afterthought

I spend countless hours discussing LLM security, so I won’t continue beating that horse here. Let’s just say all of the current security issues still apply to GPTs, with a bit more consideration for your use case, and security will undoubtedly be a driving factor for any business use case. Just like trying to protect your system prompt, anything you put in a GPT can also be exposed.

https://twitter.com/petergyang/status/1722846616896651336?s=61&t=qLEWkcAjomspZNvEroO87w

This vector means there are confidentiality and intellectual property risks with GPTs. And if you think, oh, that’s an easy fix. It’s not, and when this one is patched, another one will be found. Consider anything you put in a GPT as being public. If you have any IP or sensitive data, it must stay out of GPTs, and you’d be better served deploying independently.

If you have any IP or sensitive data, it must stay out of GPTs

The one thing you can count on is that things will be attacked and data will be lost. These are new technologies, and we are still poking around at them. I’ve said many times these systems represent a single interface with an unlimited number of undocumented protocols, which is bad for security.

These systems represent a single interface with an unlimited number of undocumented protocols, which is bad for security

Innovation Ripeness

Major disruptions caused by innovation, such as the App Store on the iPhone, aren’t just about the tech itself or its capabilities. It’s about how ripe the area was for innovation in the first place. This ripeness combines factors such as capabilities, social trends, and timing.

For those who don’t remember, phones were things people used to talk into… not to Siri but to another human being. You’d speak into the phone’s microphone, and magically, on the other end, someone would hear your voice and want to talk. For mobile phones, you’d have a certain number of minutes you could talk on your phone plan, and text messages were extra. That is, if you ever wanted to text at all on the phone’s number pad or if you were (un)lucky T9. People even had separate devices for listening to music. How ancient!

Then, the prices came down, and more and more people started carrying mobile phones while simultaneously getting data connectivity, keyboards, and storage. People started texting more than speaking, and the transformation of the phone into both a communication and entertainment platform began.

It was in the midst of this transformation of the phone into a more central part of our lives that the App Store arrived. People wanted more and more access while being mobile on a device that was more central to their daily lives. So, the capabilities of the platform, social factors, and timing all came together. The App Store drove companies to create apps based on this demand and tap new customers on the platform.

So, will the GPT Store be the new App Store? Given these factors, it’s highly unlikely. ChatGPT isn’t a central part of most people’s lives today, and there isn’t enough evidence to think that it will be in the future. OpenAI is trying everything it can to keep users paying for ChatGPT Plus with moves such as adding Dall-E 3 to ChatGPT Plus users. I’m not sure moves like this will be enough of an incentive to keep people paying, especially when there are other options and the space is so new.

Conclusion

GPTs and the GPT Store are a neat concept and a nice addition to ChatGPT. However, it is not well thought out regarding security and content protection. This will continue to be a constant tradeoff in the years ahead. This platform makes it much easier to steal other people’s work and monetize it as your own, and I hope that OpenAI takes some steps to help content owners detect and mitigate some of these risks.

Will it become as influential as the App Store? Highly unlikely. As always, play with this stuff yourself. See the features and capabilities for yourself.
Optimizing Away Human Interactions With AI

September 28th, 2023

Again and again, we never learn seem to learn lessons. Approaching everything in the world as an optimization problem isn’t the best approach and can make things worse. Sure, some out there looked at The Matrix and relished the thought of living their lives in a simulation while submerging in a viscous liquid with tubes attached to them. Fortunately, that’s not an option, well… yet anyway. That leaves us in the real world trying our best to turn it into a simulation, and optimizing away our human interactions is one of the best ways to do that.

Relationships are work, and work is friction. Therefore, reducing relationships reduces friction. Boom, Optimized! It seems silly when phrased this way, but this is the approach we are using to address countless human interactions with tech, and we may not even realize it. When consumed by how cool a particular technology is, we tend to take the Maslow’s Hammer approach, and everything, including human interactions, becomes a nail.

Outsourcing Simulated Emotional Connections

Back in March, I wrote about this issue in a post called Outsourcing Simulated Emotional Connections to Bots. I wanted to revisit this topic now that some time has passed and we’ve made even more progress, and predictably, things have gotten worse.

Take this little gem, for instance.

Far too many people don’t see an issue with this and may want to replicate it, but even a cursory look at the article and its subject has a noticeable cringe factor. Sure, a problem is defined in that post, and that problem is YOU. It’s not a technical problem. You are the one who isn’t making time for your mom. You are the one going about your days for long periods, not even thinking about your mom. This isn’t a tech problem; it’s a YOU problem. It should make you feel bad, and that feeling is an indicator that you need to make a change. It’s your brain’s way of keeping you in check.

But even employing the tech doesn’t solve the problem because… you still didn’t think about your mom. She didn’t need to occupy any space in your brain. You’ve optimized. But why stop here? Why not clone your voice and, at regular intervals, have someone call your mom using your voice and have a conversation with her so you don’t have to? What a utopia. Then you’d never be inconvenienced by your mom. Technologically speaking, we aren’t far from having something like this be completely automated, so you wouldn’t even need to hire someone to use your voice. You could forget about your mom entirely.

On top of this, it’s incredibly deceptive. You are using technology to fool your loved one into believing they are on your mind. There’s an ethical problem with employing tech as a deception when dealing with humans, especially when those humans are your loved ones. Think about your mom’s reaction if she knew you were doing this.

Approaching this as an optimization problem means when your mom passes away, things get better.

You only have a limited amount of time with your mother, and before you know it, she’ll be gone. Approaching this situation as an optimization problem means things get better when your mom passes away, but we know this isn’t true.

Introducing ThereBot!

Warning: Future Advertisement Below

Having kids is a hassle. You spend so much time going from event to event, sporting events, band recitals, plays, this list goes on and on. What if there was a way to do what you wanted without having to be bogged down by pesky activities and your child’s emotional well-being? Well, now you can!

ThereBot
Introducing ThereBot. ThereBot is an exciting new way for you to be there without having to be there! ThereBot uses an adaptive architecture to respond properly to your child’s activities. It’s quiet during recitals and cheers your child on during sporting events. If you decide to watch the event after the fact wink wink ThereBot has your back. Our cutting-edge algorithms cut out all the boring stuff, so you only get the highlights—hours of wasted time condensed into a few minutes. ThereBot pays for itself!

ThereBot+

But why stop there? ThereBot+ comes with an impressive array of upgrades, including a screen showing an image of you as though you are watching the game and the ability to clone and use your voice. This means you can shout, “Daddy loves you,” at any time like you were actually there. Here’s how to order!

Shame Isn’t An Effective Long-Term Control

In the short term, the thought of sending a robot instead of going yourself isn’t something many would do, not because they don’t want to, but because not only can your children observe your non-attendance, but others can also. So, the big catch in the short term is shame. We all know shame isn’t a long-term control. It starts by saying, “I’ll use it when I’m traveling and can’t attend,” or “I’m just too busy right now.” Plus, people can be shameless; the more shameless people there are around, the more that activity becomes normalized and contagious.

Dehumanizing Through Optimization

We are often distracted by how cool a particular new technology is and look to apply it to every use case we can. This is a sort of Shiny Object Syndrome applied to technology. We are more focused on what it does than what it does to us. This Maslow’s Hammer approach leads us to solutions in search of problems without understanding underlying issues. This gets far worse in social contexts.

The rise in self-centeredness and even narcissism is growing. Our modern, social media-driven world forces us into a cycle of constant self-promotion. I believe this pre-dates social media, though, and began with my generation raising children in the age of the self-esteem movement. A movement that many still exercise even though it’s been proven to be detrimental. For an entire exploration of this topic, I highly recommend Will Stor’s book Selfie: How We Became So Self-Obsessed and What It’s Doing to Us.

We already dehumanize others, treating them more like processes, checklists, or apps than other humans. This was something I mentioned in my previous post. We do this with everyone: shift workers, customer service representatives, Uber drivers, and even coworkers. Everyone seems to be an obstacle in getting what WE want. I’m certainly guilty of this myself, not considering the human on the other end of the phone or the person behind the counter when I’m having an issue.

We turn to technology in these cases to provide the optimization we need to reduce the friction of dealing with others. These others aren’t constrained to strangers and acquaintances. They are also friends and family.

These trends lead to a bunch of questions. Are humans evolving to be more self-centered? Will we stop caring about others in the future? Will we stop loving? I mean, what causes more friction than love? After all, love can make you feel worse than you’ve ever felt in your entire life. Will we stop even taking chances on love? Some people certainly have already. I don’t think this is a healthy trajectory.

Also, why even have friends? It seems like such a massive waste of time. You have to do things you don’t want to and potentially deal with problems other than your own. You’ve got your own problems to deal with. It’s one thing to think this, but saying it out loud is something else entirely. We are often confronted with our ridiculousness by saying things out loud. It’s something we should do far more often as a gut check.

There is more and more evidence that younger generations are forgoing friendship. One survey reported that 22% of Millennials say they have no friends at all. This isn’t constrained to Millennials. The numbers are down across multiple age groups, with people having fewer close friends with Gen Z even trying to spend money to make friends and, of course, turning to technology to solve their friendship woes. Social Media has certainly accelerated this by making things superficial and fake. And, of course, the global pandemic right in the middle of all of this pushing the accelerator to the floor.

Humans evolving into machines instead of machines into humans is something that doesn’t get enough attention.

Friction is Currency

Not all friction is bad. In some cases, the friction is the point of the task. But regarding human interactions, here’s a thought: friction is the currency that pays for fulfillment. Looking at a potential friendship and asking, “What’s in it for me?” is the wrong question with a wrong answer. Unfortunately, far too many people have this perspective. Even if you had incredibly selfish motives, you may not know what’s in a friendship until it bears fruit, which may not be evident until later.

Friction is the currency that pays for fulfillment.

Friendships are valuable simply by being. It’s hard to describe, kind of like love. It’s like the old trick question someone asks, “What do you love about me?” It’s not so easy to summarize. You just kind of know it, and you are better off for having it.

Coworkers

The workplace is where people justify classifying their coworkers as tasks or obstacles. This certainly isn’t new, but it’s an area that people love to talk about optimizing with tech. Even some chatbot demos speak about how great it would be if you didn’t have to be bothered by your inbox at work, but even your coworkers shouldn’t be treated like apps just because they may not be your friends. Relationship building at work is essential for many reasons, but in an age of diminishing jobs, relationship building may be the best way to save yourself when the cutbacks happen.

Collaboration itself appears inefficient because it’s just easier to do something yourself. But once again, friction is currency. Anyone who’s ever written music or been in a band knows how frustrating it can be to collaborate with other strong personalities. However, when you realize that the different perspectives elevate a song to a level it wouldn’t have achieved on its own, the insight is incredibly enlightening and makes you appreciate other’s input. This is the same at the workplace.

In relationships, like so many other activities, the friction is the point.

The Coming Chatbot Hangover

We haven’t yet hit the hangover stage. We are still at the bar, slurring our speech while we make the most insightful point in the history of human civilization, but it’s coming. I wrote about this in the Social Impacts section of my Post-Black Hat USA and DEF CON AI Thoughts post. We are about to enter an era of historical figures, celebrities, and persona-based chatbots, all to increase engagement on particular platforms. These systems will boast massive numbers after launch as people check it out, followed by a very steep drop-off as the novelty wears off and the superficial and fake nature of the interaction sets in.

At least when we play a video game, we realize that NPCs aren’t human. What we are doing is trying to say that the bot is a representation of a specific human, which it is not. Subconsciously, we know this, and after the initial euphoria wears off, reality sets in, and the whole concept seems cheap and manipulative. Remember, this is far different than an algorithm working behind the scenes. Bots are directly in front of people and interacting with them.

Conclusion

Removing the smoke detectors in your house is a great way not to hear the smoke detector go off every time you cook, but obviously, this isn’t solving the real problem.

We don’t realize we may be causing other effects and problems when we focus only on the technology and its cool factor. We may be fooled into thinking that friction is the problem when it may be the point or an indicator. Removing the smoke detectors in your house is a great way not to hear the smoke detector go off every time you cook, but obviously, this isn’t solving the real problem. Friction and discomfort in human interactions can be like a smoke detector, a leading indicator that something else needs to be addressed. So, call your mom today. I know I will.
Generative AI, Deepfakes, and Elections: Apocalypse or Dud?

August 28th, 2023

We are about to be inundated with stories of misinformation and deepfakes, all focused on the 2024 US election. I know the last thing most people in the United States want to consider is the 2024 election. Election cycles are tiring, but even before we get into full swing, there are already grumblings about AI. I mean, why wouldn’t there be? It’s been all AI all the time. Generative AI is here, in case that’s something you’ve somehow failed to notice. Methods for generating text and images keep getting better and better, and they are far more accessible than they’ve ever been.

I’ve pulled no punches that I think the capabilities of LLMs are overhyped, but they excel in the areas useful for generating misinformation. I’ve even said that this would be the year that generative AI starts replacing jobs, something that appears to be already happening. So, with a looming election, highly capable systems, and low cost of generation, what effect will generative AI have on the 2024 US Election?

So here’s my claim: Misinformation and Deepfakes won’t affect the outcome of the 2024 US election. More accurately, it will have a “statistically insignificant” effect on the 2024 US election.

Note: For this post, I’m using the term misinformation to cover instances of misinformation and disinformation.

Generative AI and Wide Availability

Due to the recent boom of generative AI, the 2024 US election will be the first major US election where these tools are widely accessible. This accessibility extends to everyone involved, including campaigns, nation-states, malicious actors, and even the general public.

To take accessibility a step further, this can be done very cheaply. People don’t have to use the models hosted by providers like OpenAI, Stability AI, Midjourney, etc. Models for generating text, images, and audio can be run on consumer machines or at least machines that aren’t much bigger than consumer machines. These models are also available without the typical guardrails. With all of this availability and ease of access, that begs the question, won’t this lead to a misinformation apocalypse?

2024 Misinformation Apocalypse? Not So Fast

Misinformation in the context of generative AI means the purposeful manufacturing of false information in photo, video, text, or audio formats with a particular goal. This content is then used to serve a message around events and activities that either didn’t happen or reframe events that happened differently. I refer to this as “narrative evidence,” I wrote about this back in 2020. You are manufacturing false content as evidence to support a larger narrative. This narrative is meant to support a position or demonize someone else but with a goal in the case of an election. Fortunately for us, this condition only remains highly effective when the novelty factor is high, and this novelty factor is dropping quickly.

In the context of an election, misinformation is meant to sway opinion and affect voters. For example, this example of ludicrous claims that high-profile figures in the Democratic Party are actually on house arrest, with the associated and laughable proof. No AI is necessary in this case. Spreading content like this is meant to convince people that voting for people in the Democratic Party is a bad idea and they should vote the other way (or stay home), but it doesn’t work that way in practice.

Misinformation at scale has both logistical and social challenges, so let’s look at the Generative Misinformation Cycle.

Generative Misinformation Cycle

Let’s break down the generative misinformation cycle into a few different steps. Breaking this down into several steps helps to highlight what’s easy and what really matters.

Generation – This step is the creation of the content. This step is easy and mostly friction-free, even without generative AI. What Generative AI brings to the table is an increase in velocity, not precision. So you can generate misinformation much faster and create more volume, but there’s no guarantee that misinformation will be better, and quite often, it can be worse than human-generated misinformation. For example, try getting an LLM to explain why the Distracted Boyfriend meme caught on. I mean, it’s difficult for humans to explain why certain things catch on as well.

There are quite a few cultural movements to latch on to that LLMs don’t understand, but there’s no doubt you can create massive amounts of content with generative AI. Sure, once a cultural movement has been identified, a bad actor can then try to latch on to it by automatically generating misinformation, but this slows down the process and is less effective.

Amplification – A piece of misinformation does no good if nobody sees it. Amplification is getting that content in front of the eyes of as many people as possible. Preferably the people who’d most likely engage with it since more engagement leads to more amplification. You’ll also increase the potential success of the intended outcome of the misinformation.

When it comes to amplification, it’s not as hard to amplify as some would have you believe. Nation-states have an army of people that amplify content. If you can hit the right chord aligning with people’s biases, they’ll amplify the content.

Engagement – Engagement is getting people to interact with the content. This could be in liking, sharing, or even commenting on it. The more engagement, the more false consensus is built around the content. This engagement can feed back into the amplification phase through algorithmic amplification on social media or merely exposing others to the content. It would be a mistake to assume that engagement leads to an outcome. People share things they don’t read all of the time because the title agrees with their biases.

Outcome – This is the action the misinformation is intended to have. This may increase votes for a party or candidate or get people to believe something. This is where misinformation really matters. It’s not so cut and dry as a call to action, but it could be a change of mind on a topic.

For any piece of misinformation to be effective, there needs to be a successful outcome. This is much harder than it seems. Amplifying and increasing engagement seems like the goal, but it’s not. Many people discussing AI-generated misinformation talk about how well it can structure articles and provide references. But we know that many sharing content don’t read the content they share.

Mental Cement

People have made politics (and many other things) religions now. We’ve had a pandemic and lockdowns for people to spend an inordinate amount of time online and cement their biases. Every bit of content we encounter, we apply our biases to it. If it’s something we like, we assume it’s true. If it’s something we don’t like, it must be a deepfake. I mentioned the concept of claiming deepfakes in my 2020 post, and it seems even Elon Musk has made this a reality.

Almost no amount of misinformation will get people to change their minds about something they believe in. It’s why it’s so hard to get people out of cults, change religions, or even political parties.

Getting people to change these fundamental things after cements takes a massive effort. My dad was one of the few who did change religions, but only because of my mom. People occasionally also switch political parties, but it’s also rare. It’s much more likely to have people become unaffiliated. People don’t switch religions; they leave religions. People don’t switch political parties; they become independent. This may be a silver lining when it comes to misinformation. I’ll get to this later.

Convincing someone to believe in misinformation only works if you have two fundamental aspects. A non-politically charged topic and something that doesn’t go against the strong biases of the person encountering the content.

Convincing someone to believe in misinformation only works if you have two fundamental aspects. A non-politically charged topic and something that doesn’t go against the strong biases of the person encountering the content. It’s certainly not impossible, but the climb is significant.

Instances Don’t Equal Impact

You’ll see the press and pundits point out instances of misinformation as proof that it’s having an effect. This isn’t the case. We’ll most certainly see more content, AI-generated or otherwise, focused on the 2024 election. An Increase in content doesn’t equal an increase in influence or effects on a significant scale. This would be the “Outcome” step in the Generative Misinformation Cycle.

In the context of the election, misinformation, and deepfakes will not be used to change people’s minds but to excite the base and poke fun at the opposite candidate. In 2024, people will wage meme warfare, and generative image models will be their weapons.

CounterCloud

CounterCloud is an experiment in fully autonomous disinformation, and it’s terrifying to some people.

It’s a neat experiment in what’s possible, and the approach is interesting for creating counter-narratives. You can read more about it here. However, once again, this overlooks the fact that many people don’t read the articles. They share based on the headlines. It also has other more fatal flaws, such as it works to drive people to a single site, even though it can use social media to drive attention there. Ultimately, this would be identified pretty quickly. And yes, lessons learned here could be more stealthy, but we still have the same issues I covered in this post.

But, Deepfakes Tho

Nowhere does the misinformation become spicier than the arguments about deepfakes. When I relaunched this blog back in 2020, the topic of Deepfakes was the first I tackled. I mostly focused on how their threats weren’t appropriately phrased and overhyped. Imagine that. I felt the real legacy of deepfakes lies in their ability to harass versus their convincing people that something happened. I still feel this way. Fooling people only works while the novelty factor is high, then there is a steep drop off.

Let’s look at Pope in a Puffer Jacket, also known as Balenciaga Pope. I know this image fooled many people, which seems to go against my point in the post, but not so fast.

The Pope in a puffer jacket image fooled people because nobody cared about the Pope or his jacket. If this were a politically charged topic or a topic that people were highly biased toward, it would have received much more scrutiny.

Meme Wars

Generative AI will most likely be used to create memes and caricatures during the election cycle. This won’t all be malicious. Some of it will be downright hilarious (depending on which side of the political spectrum you are on), such as the images created of RuPublicans.

Although some memes and content will be good fun, much of it will be malicious. If generative image tools restrict the ability to generate political figures, then that could slow down this meme war a bit, but some of these models are open source and could be run on systems without these guardrails. So, we’ll see as soon as the election cycle starts heating up.

Misinformation and Deepfakes: Still a Problem

Just because I don’t think misinformation and deepfakes will affect the 2024 US election and don’t always work in high-stakes situations doesn’t mean I don’t think these are a problem. In my previous post, I wrote that I felt the real legacy of deepfakes would be in their use in harassment. So, activities like mocking people or creating non-consensual porn are two examples of this.

Also, there are so many non-politically charged situations where it’s easy to fool people. Where the stakes are low, nonsense will proliferate. Just like Ted Cruz recently fell for the old shark in a waterway hoax.

This does bring up another issue, and that is we are creating an internet of junk. Even if it’s not malicious or directly harmful to anyone, it still has the potential to affect people. There are some fundamental issues in creating a world where you never really know if any content you encounter is real or not. This is really the near future we are headed for. I need to give this some more thought to consider the full impacts at scale.

There are some fundamental issues in creating a world where you never really know if any content you encounter is real or not.

A Silver Lining

Will the deluge of nonsense have a positive effect? It’s possible. Consuming misinformation and other nonsense is consuming mental junk food. It feels good, but there’s no substance. Just like eating cake and ice cream for every meal seems fun, it’s not fun in practice.

When you are bombarded with things, you tend to check out. The mental junk food becomes less fun, and you stop interacting with it, possibly block it, or just leave social media for a while. So, it could have a positive impact. I realize I may be too hopeful, but it’s possible. I’m also aware of the arguments that say making people tune out is the point, but even given their argument, I don’t think it’s all bad on that side.

This is also precisely why legitimate news outlets shouldn’t use Generative AI to curate and write articles. This makes these news sources seem like part of the problem when the rest of the internet is filled with nonsense. The stakes are too high, and the value too low.

Conclusion

This post contained some food for thought, possibly going in the opposite direction of what may be reported. I could be completely wrong about all of this, and the tide of the election could very well turn based on AI-generated misinformation, but I don’t think so. Usually, I’d be happy to be wrong, but not in this case for obvious reasons.

There isn’t much we can do for the time being except employ critical thinking skills and evaluate content accordingly. The hype of 2024 is right around the corner. I do feel there are a couple of fundamental things we can be doing to prepare for a world in which reality is merely a suggestion. This involves teaching data literacy as well as probability and statistics in the K-12 curriculum. Making room for these subjects is vital to prepare students for not just the future but what we now have in the present.
Post-Black Hat USA and DEF CON AI Thoughts

August 17th, 2023

Wow, another Black Hat USA and DEF CON are in the books, and it was great seeing everyone. One of the best parts of conferences is the conversations, and those conversations were amazing. As you can imagine, many of them were about “AI.” Since there were no cameras in the AI Security Challenges, Solutions, and Open Problems meetup and it will be a while before the Forward Focus: Perspectives on AI, Hype, and Security presentation makes its way online, I thought I’d summarize a few points as well as distill some of my perspectives on the topics I covered and conversations I had, now that I’ve had a few days to reflect.

Perspective on LLM Impacts

I deal with so many people making nonsensical or unfounded claims that I wanted to make it clear where I stand on the subject of LLMs and their impact on humanity. When you live in reality, you tend to be labeled a hater.

I’m not big on making predictions, but let me say this with a fair amount of confidence, LLMs will not be more impactful on humanity than the printing press, and GPT-5 won’t achieve AGI. Those of you who know me will find the fact that I’m in the middle unsurprising, but hey, the only technology I hate is PHP 😉

All AI All The Time

As was expected, everything was all AI all the time. Every vendor booth had the term “AI.” AI-powered products, AI pen testing, AI assurance, AI, AI, AI! Everyone is ALL in. Even though I expected it, being confronted with the term absolutely everywhere was still shocking. What we’d poked fun at in the past has become our reality. Everyone is trying to ride the wave to success, regardless of their skills or capability. It would be easy to blame this on marketing departments, but it was far more than that.

All references to machine learning seemed to be scrubbed in favor of using the term “AI.” Seems machine learning is having its “cyber” or “crypto” terminology moment. I learned long ago that fighting the industry over terminology is a losing battle, so yes, I’m giving in to the massive, crushing weight of hype, and I’ll move the battlefront to somewhere else.

Losing the terminology battle isn’t without drawbacks.

Still, losing the terminology battle isn’t without drawbacks. It seems many are also using the term AI synonymously with generative language models, which just muddies the water more. When you mention that you think the capabilities of LLMs are overhyped (i.e., not going to be more impactful than the printing press, etc.), people tend to throw out things like drug discovery or AlphaFold. When you point out that those are different approaches and it’s not like ChatGPT is doing that, they tend to still cling to adjacent success in specific domains as an indicator of success here. It’s like being in a VW Bug and pointing out that a Ferrari can do over 200 mph.

This is also a shame since many more traditional machine learning approaches aren’t even considered as people rush to LLMs, even approaches that are more reliable and proven for specific security problems. I think this will level out at some point, but not anytime soon. Time to put LLMs on the moon!

Where People Stand

The consensus from many I talked to is that they were just trying to figure out where they stood. They’ve heard so many outrageous claims, and the reporting on advancements has been so all over the place. On the one hand, you have people claiming GPT-5 is going to be AGI; on the other, you have people advocating military strikes against data centers. It’s no wonder people are confused.

Given the wild reporting, outrageous claims, and AI hustle bros trying to get you to subscribe to their channels, I was surprised that most people were pretty grounded. Many didn’t think AI would take their job or that the ChatGPT Plugin Store would be more impactful than the mobile App Store on humanity. I found this incredibly refreshing.

I suggested to the people I talked to that whenever you hear someone spouting outrageous claims, ask them why they think that. People making outrageous claims about LLMs often try to drive attention into their funnel. They want people subscribing to their Substack, YouTube, Mailing lists, etc. They can make these claims and never have to justify them, never have to give examples or show real-world impact. The rest of us have to live in a reality where our software has to work, scale, and be reliable. So, beware of people making claims without providing specific examples. Also, stories in the news often don’t reflect realities on the ground.

Fooling Ourselves Is Easy

The social contagion status of ChatGPT highlighted a vulnerability in humans, and that’s that we are very bad at creating tests and very good at filling in the blanks. The world is filled with experiments, and highly-cherry picked examples. We tend to see a future that isn’t there. We often forget that the world is filled with edge cases, which confuse many of these AI systems.

The social contagion status of ChatGPT highlighted a vulnerability in humans, and that’s that we are very bad at creating tests and very good at filling in the blanks.

Look at self-driving cars, for instance. We see a demo of a self-driving car properly navigating the roadway, and we assume that truck driving as a profession is doomed almost immediately. It seems like one of the easier problems, stay in the lane, obey the signs, and don’t hit things. Boom! But anyone who’s driven a car knows that edge cases are everywhere. Road construction, lighting conditions, snow, accidents, etc. Humans handle these conditions pretty well, by contrast.

Supercharged Attackers

LLMs won’t supercharge inexperienced attackers

One point I brought up in the meetup and during our panel, was that people made similar claims about Metasploit supercharging inexperienced attackers when it was launched over twenty years ago. People made claims that Metasploit was like giving nukes to script kiddies. Those comments didn’t age well, and I think the same is true about LLMs. You still have to know what you are doing when using LLMs to attack something. It’s not like point, click, own. Also, it’s not like LLMs are finding 0day or writing undetectable malware. I know. I’ve seen the research and reports. Neat research, but it’s not like it’s overly practical for attacks at scale.

People made claims that Metasploit was like giving nukes to script kiddies

Today, most malicious toolkits you hear about, like FraudGPT, WormGPT, and many others that have popped up, are primarily tools for phishing and social engineering attacks (despite having “worm” in the title.) This can certainly have an impact, but not on the apocalyptic levels that some would have you believe. All of this technology is indeed dual use, so something that’s helpful for security professionals will also be helpful for criminals. Just like we have people hyping AI on the clear web, you have people hyping AI on the dark web.

Losing Your Job To AI

Most people I talked to didn’t seem overly concerned about losing their job to AI, but I got the feeling that it was in people’s minds regardless. The recent sting of many layoffs is probably not helping the uncertainty. This was one of the points we tried to address from the stage at Black Hat. I used the example of AlphaGo. I asked the audience how many people had heard of AlphaGo beating Lee Sedol at Go. I was surprised that very few hands in the audience went up since it was big news at the time. I then asked how many people had heard of the research from Stewart Russell’s lab that allowed even average Go players to beat these superhuman Go AIs. No hands went up.

My point was that there is a lesson here for security professionals. These new technologies tend to have their own vulnerabilities and issues that also need to be addressed. In addition, all of these technologies have gaps, and the gaps will need to be filled. So, for the foreseeable future, your job is safe in the context of information security. We’d have a much different conversation if you were a freelance graphic artist.

Misinformation and Deepfakes

I was a bit surprised by the fact I didn’t hear any conversations about misinformation and deepfakes. I’m sure they happened, but not at any of the events or conversations I participated in. The only time it was brought up, it was brought up by myself in conversation. I have a rather spicy take on the 2024 US Election. I think misinformation and deepfakes will have a statistically insignificant effect on the 2024 election. I will address this in a future blog post, but in summary, people have already made up their minds and cemented their biases.

It’s not that these issues aren’t important or impactful, just in context, not significant. I wrote about this topic back in 2020 when I relaunched my blog. Interestingly, in that post, I also mentioned the people who should be most concerned about the technology powering deepfakes: actors and actresses. Very relevant now with the SAG AFTRA strike and AI being a big concern.

Social Impacts

There were virtually no conversations about the social impacts of Generative AI other than the conversations I initiated. This isn’t surprising since it’s a large focus of my blog, and I spend a lot of time thinking about these topics. Seems most people were focused on use cases and capabilities. My fellow tech people are often optimizers and look to optimize everything. They don’t realize that friction is the point in certain cases.

I think the chatbotification of everything is something humans are starting to tire of.

I think the chatbotification of everything is something humans are starting to tire of. When someone launches a new service, you have this quick uptake due to the novelty factor, followed by a steep drop-off. We are about to enter an era of celebrity and historical figure chatbots, I think the same curve applies.

We’ll see lots of press, rapid adoption, followed by a steep drop-off. This could be due to boredom, lack of true functionality, or even something more primal, which is the sort of “fake factor” of it all. We know we aren’t actually talking with Harriet Tubman when we use the chatbot. What seems kind of fun at first starts to take on a tarnish very quickly. As tech people, we get so caught up in the cool factor of the technology we build that we tend to forget the human factor in all of this. I think I’m on the right track here, but I realize I’m also old and have never played Minecraft, so I could be wrong.

Customer support chatbots, the ones that are directly customer-facing, have some promise, but only if they are empowered to take the action necessary to resolve the issues that customers are having. On the flip side, having an empowered chatbot also opens the door to manipulation. So this, too, has issues. My gut tells me that as organizations launch empowered bots for various things, there will be subreddits dedicated to manipulating them. This manipulation could be for fun, getting discounts, or stealing services. Time will tell.

There’s certainly some promise in hybrid workflows pairing humans and bots together, where the human is actually the one in first-party contact with the customer. This may be the ultimate path, but something tells me the replacement path will start first, and hybrid will be the fallback.

Prepare To Be Surprised

In my closing statement at Black Hat, I mainly told people to prepare to be surprised. There are lots of experiments and money pouring into the space. Anyone who thinks they have they can see the future here would be fooling themselves. The whole thing is simultaneously exciting and scary. The best thing people can do is remain grounded but also play with the technology. Don’t sit on the sidelines, generative models are pretty accessible. Play around and apply it to some of your use cases. Above all, have fun.
The Brave New World of Degraded Performance

July 31st, 2023
If we are not careful, we are about to enter an era of software development, where we replace known, reliable methods with less reliable probabilistic ones. Where methods such as prompting a model, even with context, can still lead to fragility causing unexpected and unreliable outputs. Where lack of visibility means you never really know why you receive the results you receive, and making requests over and over again becomes the norm. If we continue down this path, we are headed into a brave new world of degraded performance.

Scope

Before we begin, let’s set the perspective for this post. The generative AI I’m covering in this post is related to Large Language Models (LLMs) and not other types of generative AI. This post focuses on building software meant to be consumed by others. Products and applications deployed throughout an organization or to delivered to customers. I’m not referring to experiments, one-off tools, or prototypes. Although, buggy prototype code can have an odd habit of showing up in production because a function or feature just worked.

This post isn’t about AI destroying the world or people dying. It’s about the regular applications we use, even in a mundane context, just not being as good. The cost of failure doesn’t have to be high for the points in this post to apply. I’m saying this because, in many cases, the cost may be low. People probably won’t die if your ad-laden personalized horoscope application fails occasionally. But that doesn’t mean users won’t notice, and there won’t be impacts.

Our modern world runs on software, and we are training people that buggy software should be expected.

Our modern world runs on software, and we are training people that buggy software should be expected, and making requests repeatedly is the norm, setting the expectation that this is just the price paid in modern software development. This approach is bad, and the velocity at all costs mantra is misguided.

Let me be clear because I’m sure this will come up. I’m not anti-AI or anti-LLM or anything of the sort. These tools have their uses and can be incredibly beneficial in certain use cases. There are also some promising areas, such as the ability of LLMs to, generate, read and understand code and what that means for software development in the coming years. It’s still early. So in no way am I claiming that LLMs are useless. I’m trying to address the hype, staying in the realm of reality and not fantasy. The truth today is that maximizing these tools for functionality instead of being choosy is the problem and there are costs associated.

Software Development

Software development has never been perfect. It’s always been peppered with foot guns and other gotchas, be it performance or security issues, but what it lacked elegance, it made up in visibility and predictability. Developers had a level of proficiency with the code they wrote and an understanding of how the various components worked together to create a cohesive service, but this is changing.

Now, you can make a bunch of requests to a large language model and let it figure it out for you. No need to write the logic, perform data transformations, or format the output. You can have a conversation with your application before having it do something and assume the application understands when it gives you the output. What a time to be alive!

There’s no doubt that tools like ChatGPT increased accessibility to people who’ve never written code before. Mountains of people are creating content showing, “Look, Mom, I wrote some code,” bragging that they didn’t know what they were doing. I’ve seen videos of University Professors making the same claims. This has and will continue to lead to many misunderstandings about problems people are trying to solve and the data they are trying to analyze. Lack of domain expertise and lack of functional knowledge about how systems work is a major problem but not the focus of this post.

As a security professional, inexperienced people spreading buggy code makes me cringe (look at the Web3 space for examples), but It’s not all bad. In some ways, this accessibility is a benefit and may lead to people discovering new careers and gaining new opportunities. Also, small experiments, exploration, or playing around with the tools are absolutely fine. It’s how you discover new things. However, inefficiencies, errors, and lack of reliability aren’t dealbreakers in these cases. But what happens when this mindset is taken to heart and industrialized into applications and products that impact business processes and customers?

Degraded Performance

There’s a new approach in town. You no longer have to collect data, ensure it’s labeled properly, train a model, perform evaluations, and repeat. Now, in hours, you can throw both apps and caution to the wind as you deploy into production!

This above is a process outlined by Andrew Ng in his newsletter and parroted by countless content creators and AI hustle bros. It’s the kind of message you’d expect to resonate, I mean, who wouldn’t like to save months with the added benefit of removing a whole mountain of effort in the process? But, as with crypto bros and their Lambos, if it sounds too good to be true, it probably is.

Let’s look at a few facts. Compared to more traditional approaches:
- LLMs are slow
- LLMs are inefficient
- LLMs are expensive ($)
- LLMs have reliability issues
- LLMs are finicky
- LLMs can and do change (Instability)
- LLMs lack visibility
- Benchmarking? Measuring performance?
Pump the Brakes

Traditional machine learning approaches can have much better visibility into the entire end-to-end process. This visibility can even include how a decision or prediction was made. They can also be better approaches for specific problems in particular domains. These approaches also make it far easier to benchmark, create ensembles, perform cross-validation, and measure performance and accuracy. Everyone hates data wrangling, but you learn something about your data, given all that wrangling. This familiarity helps you identify when things aren’t right. Having visibility into the entire process means you can also identify potential issues like target leakage or when a model might give you the right answer but for the wrong reasons, helping avoid a catastrophe down the road.

The friction in more traditional machine learning is a feature, not a bug, making it much easier to spot potential issues and create more reliable systems.

The friction in more traditional machine learning is a feature, not a bug

Lazy Engineering

On the surface, letting an LLM figure everything out may seem easier. After all, Andrew Ng claims something similar. In his first course on Deeplearning.ai ChatGPT Prompt Engineering for Developers He mentions using LLMs to format your data as well as using triple backticks to avoid prompt injection attacks. Even the popular LangChain library instructs the LLM to format data in the same way. Countless others are creating similar tutorials flooding the web parroting this point. Andrew is a highly influential person who’s helped countless people with this training by making machine learning more accessible. With so many people telling others what they want to hear, as well as the accessibility of tools like LangChain, this will have an impact, and it’s not all positive.

One of the goals of software engineering should be to minimize the number of potential issues and unexpected behaviors an application exhibits when deployed in a production environment. Treating LLMs as some sort of all-capable oracle is a good way to get into trouble. This is for two primary reasons, lack of visibility and reliability.

Black Boxes

A big criticism of deep learning approaches has been their lack of transparency and visibility. Many tools have been developed to try and add some visibility to these approaches, but when maximized in an application, LLMs are a step backward. A major step backward if you count things like OpenAI’s Code Interpreter.

The more of your application’s functionality you outsource to an LLM, the less visibility you have into the process. This can make tracking down issues in your applications when they occur almost impossible. And when you can track problems down, assuming you can fix them, there will be no guarantee that they stay fixed. Squashing bugs in LLM-powered applications isn’t as simple as patching some buggy code.

Right, Probably

LLMs are being touted as a way to take on more and more functionality in the software being built, giving them an outsized role in an application’s architecture. Any time you replace a more reliable deterministic method with a probabilistic one, you may get the right answer much of the time, but there’s no guarantee you will. This means you could have intermittent failures that impact your application. In more extreme cases, these failures can cascade through a system affecting the functionality of other downstream components.

For example, anyone who has ever asked an LLM to return a single-word result will know that sometimes it doesn’t, and there’s no rhyme or reason why. It’s one of the classic blunders of LLMs.

So, you may construct a prompt stating only to return a single word, True or False, based on some request. Occasionally, without warning and even with the temperature set to 0, it will return something like the following:
```
The result is True
```
Not the end of the world, but now translate this seemingly insignificant quirk into something more impactful. Your application expected a result from an LLM formatted in a certain way. Let’s say you wanted the result formatted in JSON. Now, your application receives a result that isn’t JSON or maybe not properly formatted JSON, creating an unexpected condition in your application.

Suppose we combine this reliability issue with the lack of visibility. In that case, it can lead to some serious issues that may be intermittent, hard to troubleshoot, and almost impossible to fix without reengineering. In a more complex example, maybe you’ve sent a bunch of data to an LLM and asked it to perform a series of actions, some including math or counting, and return a result in a particular format. A whole mess of potential problems could result from this, all of which are outside your control and visibility.

Not to mention a big point many gloss over, deploying your application in production isn’t the end of your development journey. It may be the beginning. This means you will need to perform maintenance, troubleshooting, and improvements over time. All things LLMs can make much more difficult when functionality is maximized.

To summarize, outsourcing more and more application functionality to an LLM means that your application becomes less modular and more prone to unexpected errors and failures. These are issues that Matthew Honnibal also covers in his great article titled Against LLM Maximalism.

The Slow and Inefficient Slide

In some use cases, it may not matter if it takes seconds to return a result, but for many, this is unacceptable. Having multiple round trips and sending the same data back and forth may be necessary due to different use cases because a character changed or because of context window size, which also adds to the inefficiency. Even if the use case isn’t critical and inefficiencies can be tolerated, that’s not the end of the story.

There are still environmental impacts due to this inefficiency. It requires much more energy consumption to have an LLM perform tasks than more traditional methods. For example, searching for a condition with a RegEx vs. sending large chunks of data to an LLM and letting the LLM try and figure it out. The people ranting and raving constantly about the environmental impacts of PoW cryptocurrency mining are incredibly silent on the energy consumption of AI, even as former crypto miners turn their rigs toward AI. Think about that next time you want to replace a method like grep with ChatGPT or generate a continuous stream of cat photos with pizzas on their head.

LLMs Change and So Do You

Any check of social media will show that at the time of this writing, there have been quite a few people claiming that GPT-4 is getting worse. There’s also a paper that explores this.

There’s some debate over the paper and some of the tests chosen, but for the context we are discussing in this post, the why an LLM might change isn’t relevant. Whether changes are because of cost savings, issues with fine-tuning, upgrades, or some other factor aren’t relevant when you count on these technologies inside your application. This means your application’s performance can worsen for the same problems, and there isn’t much you can do about it but hope if you are consuming a provider’s model (OpenAI, Google, Microsoft, etc.) This can also lead to instability due to the provider requiring an upgrade to a newer version of the hosted model, which may lead to degraded performance in your application.

Demo Extrapolation

The problem is that none of the constraints and issues may surface for demos and cherry-picked examples. Actually, the results can look positive. Positive results in demos are a danger in and of themselves since this apparent working can mask larger issues in real-world scenarios. The world is filled with edge cases, and you may be running up a whole bunch of technical debt.

Hypetomisim and Sunken Cost

There’s a sense that technology and approaches always get better. Whether this is from Sci-fi movies or just because people get a new iPhone every year, maybe a combination of both. Approaches can be highly problem or domain-specific and not generalize to other problem areas or at least not generalize well. We don’t have an all-powerful single AI approach to everything. Almost nobody today would allow an LLM to drive their car. However, some have hooked them up to their bank accounts. Yikes!

But you can detect an underlying sense of give it time in people’s discussions on this topic. Whenever you point out issues you usually get, well GPT-5 is gonna… This goes without saying that ChatGPT is based on a large language model, and large language models are trained on what people write, not even what they actually think in certain cases. They perform best on generative tasks. On the other hand, tasks like operating a car have nothing to do with language. Sure, you could tell the car a destination, but every other operation has nothing to do with language. It’s true that LLMs can also generate code, but do you want your car to generate and compile code while driving it? Let me answer that. Hell no. Heed my words, maybe not this use case, but something in the same order of stupid is coming.

Developing buggy software in the hopes that improvements are on the way and outside your control is not a great strategy for reliable software development.

Developing buggy software in the hopes that improvements are on the way and outside your control is not a great strategy for reliable software development. I’ve heard multiple stories from dev teams that they continue to run buggy code with LLM functionality and make excuses for apparent failures because of sunken costs.

The hype has led to a new form of software development that appears to be more like casting a spell than developing software. The AI hustle bros want you to believe everything is so simple and money is just around the corner.

Now’s a good time to remind everyone that fantasy sells far better than reality. Lord of the Rings will always sell more books than one titled Eat Your Vegetables. Trust me, as most of my posts are along the lines of Eat Your Vegetables posts, I make no illusions that every AI hustler’s Substack making nonsensical and unfounded predictions is absolutely crushing me in page views.

Engineering Amnesia

In a development context, we may forget that better methods exist or allow ourselves to reintroduce known issues that cause cascading failures and catastrophic impacts on our applications. This isn’t without precedent.

The LAND attack came back in Windows XP after it was known and already mitigated in previous Windows OSs. ChatGPT plugins are allowed to execute in the context of each other’s current domains, even though we’ve seen time and time again how this violates security. The Corrupted Blood episode was a failure to understand how the containment of a feature could cause catastrophic damage to an application, so much so that it forced a reset. And, of course, don’t even get me started on the Web3 space. I mean, who wouldn’t want tons of newly minted developers creating high-risk financial products without knowledge of known security issues? It was fascinating to see security issues in high-impact products for which standard, boring, and known security controls would have prevented them. These are just a couple off the top of my head, and there are many more.

As new developers learn to use LLMs to perform common tasks for which we have better, more reliable methods, they may never become aware of these methods because their method just kind of works.

Avoiding Issues

The perplexing part of all of this is that these issues are pretty easy to avoid, mainly by thinking carefully about your application’s architecture and the features and components you are building. Let me also state that these issues won’t be solved by writing better prompts.

Reliability and visibility issues won’t be solved by writing better prompts

There’s the perception that using an LLM to figure everything out is easier than other methods. On the surface, it may appear that there’s some truth to that. It’s also easier to spend money on a credit card than to make the money to pay the bill. So, it’s the case that you may be kicking the can down the road. Avoiding these issues isn’t hard, and a bit of thought about your application and its features will go a long way.

Look at your application’s features. Break these features down into functional modules. The goal of breaking down these features into smaller components is to evaluate the intended functionality to determine the best approach for the given feature. At a high level, you could ask a few questions with the goal of determining the right tool for the processing task.
- Does the function require a generative approach?
- Are there existing, more reliable methods to solve the problem?
- How was the problem solved before generative AI? (Potential focusing question if necessary)
- Is there a specific right or wrong answer to the problem?
- What happens if the component fails?
These questions are far from all-encompassing, but they are meant to be simple and provide some focus on individual component functionality and the use case. After all, LLMs are a form of generative AI, and therefore, they are best suited to generative tasks. Asking if there’s a specific right or wrong answer is meant to focus on the output of the function and consider if a supervised learning approach may be a better fit for the problem.

We have reliable ways of formatting data, so it’s perplexing to see people using LLMs to perform data formatting and transformations, especially since you’ll have to perform those transformations every time you call the LLM. Asking these questions can help avoid issues where improperly formatted data can cause a cascading issue.

Example

Let’s take a simple example. You want a system that parses a stream of text content looking for mentions of your company. If your company is mentioned, you want to evaluate the sentiment around the mention of your company. Based on that sentiment, you’d like to write some text addressing the comment and post that back to the system. We break this down into the following tasks below.

For parsing, analysis, and text generation steps, it would be tempting to collapse all of them together and send them to an LLM for processing and output. This would be maximizing the LLM functionality in your application. You could technically construct a prompt with context to try and perform these three activities in a single shot. That would look like the following example.

In this case, you have multiple points of failure that could easily be avoided. You’d also be sending a lot of potentially unnecessary data to the LLM in the parsing stage since all data, regardless of whether the company was mentioned, would be sent to the LLM. This can substantially increase costs and increase network traffic, assuming this was a hosted LLM.

You are also counting on the LLM to parse the content given properly, then properly analyze and then, based on the two previous steps, properly generate the output. All of these functions happen outside of your visibility, and when failures happen, they can be impossible to troubleshoot.

So, let’s apply the questions mentioned in the post to this functionality.

Parsing
- Does the function require a generative approach?
  No
- Are there existing, more reliable methods to solve the problem?
  Yes, more traditional NLP tools or even simple search features
- Is there a specific right or wrong answer to the problem?
  Yes, we want to know for sure that our company is mentioned.
- What happens if the component fails?
  In the current LLM use case, the failure feeds into the following components outside the visibility of the developer, and there’s no way to troubleshoot this condition reliably.
Analysis
- Does the function require a generative approach?
  No
- Are there existing, more reliable methods to solve the problem?
  Yes, more traditional and mature NLP tasks for sentiment analysis
- Is there a specific right or wrong answer to the problem?
  Yes
- What happens if the component fails?
  In the current LLM use case, the failure feeds into the following text generation component outside the developer’s visibility, and there’s no way to troubleshoot this condition reliably.
Text Generation
- Does the function require a generative approach?
  Yes
- Are there existing, more reliable methods to solve the problem?
  LLMs appear to be the best solution for this functionality.
- Is there a specific right or wrong answer to the problem?
  No, since many different texts could satisfy the problem
- What happens if the component fails?
  We get text output that we don’t like. However, since the previous steps happen beyond the developer’s visibility, there’s no way to troubleshoot failures reliably.
Revised Example

After asking a few simple questions, we ended up with a revised use case. This one uses the LLM functionality for the problem it’s best suited for.

In this use case, only the text generation phase uses an LLM. Only confirmed mentions of the company, along with the sentiment and the content necessary to write the comment, are sent to the LLM. Much less data flows to the LLM, lowering cost and overhead. By using more robust methods, much less can go wrong as well, and less likely to have cascading failures affecting downstream functions. When something does go wrong in the parsing or analysis stages, troubleshooting is much easier since you have more visibility into those functions. So, breaking down this functionality in such a way means that failures can be more easily isolated and addressed, and you can improve more reliably as the application matures.

Now, I’m not claiming that this is a development utopia. A lot can still go wrong, but it’s a far more consistent and reliable approach than the previous example.

After talking with developers about this, some of the questions I’ve received are along the lines of, “There are better methods for my task, so if we can’t cut corners, then why use an LLM at all?” Yes, that’s a good question, a very good question, and maybe you should reevaluate your choices. This is my surprised robot face when I hear that.

LLMs Aren’t Useless

Once again, I’m not saying that LLMs are useless or that you shouldn’t use them. LLMs fit specific use cases and classes of functionality that applications can take advantage of. For many tasks, there’s the right tool for the job or at least a righter tool for the job. However, this right tool for the right job approach isn’t what’s being proposed in countless online forums and tutorials. I’m concerned with a growing movement of using LLMs as some general-purpose application functionality for tasks that we already have much more reliable ways of performing.

Conclusion

Will we inhabit a sprawling landscape of digital decay where everything rests on crumbling foundations? Probably not. But there will be a noticeable shift in the applications we use on a daily basis. But it doesn’t have to be. By being choosy and analyzing functionality where LLMs are best suited, you can make more reliable and robust applications, and the environment will also thank you.