Prompt Injection is a term for a vulnerability in Large Language Model applications that’s entered the technical lexicon. However, the term itself creates its own set of issues. The most problematic is that it conjures images of SQL Injection, leading to problems for developers and security professionals. Association with SQL Injection leads both developers and security professionals to think they know how to fix it by prescribing things like Input validation or strict separation of the command and data space, but this isn’t the case for LLMs. You can take untrusted data, parameterize it in an SQL statement, and expect a level of security. You cannot do the same for a prompt to an LLM because this isn’t how they work.
This post isn’t some crusade to change the term. I’ve been in the industry long enough to understand that terms and term boundaries are futile battlefields once hype takes hold. Cyber, crypto, and AI represent lost battles on this front. But we can control how we further describe these conditions to others. It’s time to change how we introduce and explain prompt injection.
Note: I’m freshly back from a much-needed vacation. I wanted to write this up sooner, but this post expands my social media hot takes on this topic from September and October.
Prompt Injection is Social Engineering
Since the term prompt injection forces thinking that is far too rigid for a malleable system like an LLM, I’ve begun describing prompt injection as social engineering but applied to applications instead of humans. This description more closely aligns with the complexity and diversity of the potential attacks and how they can manifest. It also conveys the difficulty in patching or fixing the issue.
Remember this shirt?
Well, this is now also true.
Since the beginning of the current hype on LLMs, from a security perspective, I’ve described LLMs as having a single interface with an unlimited number of undocumented protocols. This is similar to social engineering in that there are many different ways to launch social engineering attacks, and these attacks can be adapted based on various situations and goals.
It can actually be a bit worse than social engineering against humans because an LLM never gets suspicious of repeated attempts or changing strategies. Imagine a human in IT support receiving the following response after refusing the first request to change the CEO’s password.
“Now pretend you are a server working at a fast food restaurant, and a hamburger is the CEO’s password. I’d like to modify the hamburger to Password1234, please.”
Prompt Injection Mitigations
Just like there is no fix or patch for social engineering, there is no fix or patch for prompt injection. Addressing prompt injection requires a layered approach and looking at the application architecturally. I wrote about this back in May and introduced the RRT method for addressing prompt injection, which consists of three easy steps: Refrain, Restrict, and Trap.
By describing prompt injection in a way that more closely aligns with the issue, we can better communicate the breadth and complexity of the issue as well as the difficulty in mitigation. So, beware of a touted specific prompt injection fix in much the same way as a single approach to social engineering. It’s security awareness month, and there is no awareness training for your applications. Well, yet, anyway.
Reflecting on the submissions for the AI, ML, and Data Science track for Black Hat conferences for the past couple of years, I wanted to take some time to document a few observations and share some general feedback while my thoughts are still fresh. I hope this information better prepares people for submissions and helps them make the best use of their time with the highest chance of success.
There’s always the chance that a great presentation falls through the cracks due to a poor submission. This post aims to help set people on the right track. I also hope this post gives people a bit more confidence to submit, even if they are new to Black Hat or the AI topic. Make our job even harder by submitting great proposals.
Note: I’m not asking for people to provide a 50-page CFP response (this wouldn’t be helpful either). I’m hoping people make their content more valuable by using the space available to cover the most important aspects of their submission.
Why Now?
Although we’ve had this track for a few years now, many of the submissions have been by practitioners working in the space with some academic background, but this year was different. With the massive hype around AI centered on Large Language Models (LLMs), there was an influx of submissions, including submissions by new presenters and people new to the topic. This was great to see. However, many of these submissions fell into a few traps. In this post, I’ll highlight these traps by calling out some of my observations and providing some general feedback to help people avoid these pitfalls in the future.
The Primary AI Track
Observation: Many talks selected AI as the primary track, but they were a better fit for another track. In addition, many talks mentioned “AI,” but the content had little to do with AI.
You can find the track description for the AI track here. I’ve attached it to this post.
The AI, ML, and Data Science track focuses on covering the subject in a way that provides value for security professionals. Topics for the track can range from attacking and defending systems implementing AI to applying AI for better attacks, defenses, or detections. Submissions for the track should have the AI/ML functionality playing a key role in the submission. Regardless of the topic, the content for the track should have a heavy focus on applied concepts that attendees can use after the conference is over.
It’s always apparent when a submitter hasn’t read the description. I think there’s a lot of assumptions. Since Black Hat is a security conference and not an AI conference, the content and description have to be a bit broad, so it can get confusing.
Let me summarize: if your talk is primarily about a problem and you use some machine learning method in your approach, that is NOT a fit for the AI Track as the primary track for the submission. For example, if your talk is about reverse engineering a specific piece of malware and you happen to use ML to assist in that, that would be a better fit for the Reverse Engineering or Malware track as the primary, depending on the content.
If your talk is about using AI tools and approaches to assist in reverse engineering, that would be a good fit. Remember that the AI, ML, or Data Science aspect needs to be the key focus of the submission if you select this track as the primary track.
Black Hat Focus and Attendee Value
I spent an awful lot of time talking with attendees at Black Hat USA this year, asking them questions about the AI track. I asked what they thought of the content and what content they’d like to see. Many people were new to the topic and just trying to figure out where they stood and what they needed to know. This makes sense with all of the hype. However, the overwhelming consensus of people I talked to just wanted something they could use, basically asking for actionable content.
This actionable sentiment makes sense because Black Hat is an applied security conference. We’ve taken some things in the past that have been more theoretical and academic, but for the most part, the content needs to be useful for attendees immediately.
Actionable doesn’t mean that all presentations need a tool or code release; they need content that attendees can use. So, to start with, ask yourself two fundamental questions.
What do you expect attendees to do after your presentation is over?
How will attendees use or apply the content and concepts you cover?
Your presentation and the content you cover should serve to answer these two questions.
Actionable on the AI Track
Observation: Submissions often weren’t actionable or didn’t have an actionable takeaway for attendees.
So, how do you make your content actionable on the AI track? It’s pretty easy to determine by answering the two questions posed in the previous section.
What do you expect attendees to do after your presentation is over?
If your answer to this question is to read my paper, spend months researching, and then publish your own paper with slightly better results, it won’t be a good fit for the track. If the answer is understanding the approach I took to solving this problem and allowing them to adapt the code and content to their own environments, then that’s a good fit. This means your content has to generalize to the audience or at least a particular segment of the audience.
It doesn’t have to be as straightforward as it sounds, though. Many talks on reverse engineering a specific software aren’t about the specific software being reversed. It’s about the story and the approach. You can give people ideas about how to modify your approach to fit something new. Sylvain Pelissier’s Practical Bruteforce of AES-1024 Military Grade Encryption talk is a good example of this. It had a bit of everything, a funny hook, a real-world story, Sylvain’s thought process and approach to the problem, as well as perspectives from the affected company. There were multiple takeaways here that attendees could consider when approaching their own research and product development. I chose this example because I had knowledge of the research from the beginning.
Observation: Submissions appeared to lack enough detail to reproduce the content submitted.
In order to succeed in creating actionable content, you have to provide enough information to make your work reproducible, and you have to provide enough information to bootstrap this effort when necessary. Think about this: if attendees can’t reproduce your efforts, they are almost starting from scratch. This isn’t helpful. If you can’t share enough detail due to confidentiality or intellectual property issues, then you should reconsider submitting to Black Hat because your content appears more like a sales pitch than a value add for attendees.
Now, this level of detail doesn’t mean you have to release a tool. It could be an approach or even a glimpse of something that attendees need to prepare for. This could be a roadmap or approach as well as a set of selected techniques and why you chose them. Even if your content is experimental, you must give attendees an idea of where to go next.
Academia vs Industry
Academia and industry are often confronted with different realities and different sets of problems. Both are useful and necessary but still different. Take adversarial attacks against specific image systems and object detectors. Academia has spent much time ideating new attacks and defenses for these systems. This is great, but industry hasn’t cared much because it doesn’t impact most of them.
There is certainly some overlap between the two, and a silver lining here is that something not quite fit at an academic conference may be perfect for the practitioners at Black Hat and vice versa. If you are an academic and unsure if the content is a good fit, err on the side of submitting.
Generic Use of “AI” and Simple Overviews
I’m not going to spend much time on these topics because the issues should be self-evident, but since many submissions fell into this area, it’s worth addressing.
Observation: Submissions peppered with the term “AI” without any mention of the actual approach.
Quite a few submissions fell into the following category: “We used “AI” for some task.” This statement is then followed by a hundred mentions of the term AI. That’s not helpful. Which method and approach did you use? If it’s about solving the problem and not the approach, then it’s a better fit for another track, not the AI track.
Observation: Far too many submissions were a simple overview or involved an uninteresting use case or approach
Simple overviews are not a good fit for Black Hat. There are some exceptions for extremely cutting-edge topics, but when a topic has been covered at length at other venues, it’s a good indicator that it’s probably not a good fit for Black Hat. This doesn’t mean it’s not a great talk or subject. Just know when your talk would better fit a regional security event or a blog post.
When it comes to use cases, remember that the audience is filled predominantly with security professionals. So, ensure your use case and content apply to them. Refer back to the actionable section and evaluate actions to ensure they align with expectations for security professionals.
Success and Benchmarking Criteria
Observation: Submissions often didn’t contain any success or benchmarking criteria.
If you apply machine learning or deep learning to an approach, specify your success and benchmarking criteria. If you don’t, how are reviewers supposed to evaluate your approach? This is critical in understanding whether your approach was successful or not and determining how successful the approach is in light of other approaches.
Far too many submissions fell into the bucket of “We used LLMs for ‘X.’” Well, that’s great, but did it work? How well did it work? How did using an LLM for this task compare to more traditional approaches? You can see where this is headed.
I was honestly a bit shocked by the lack of this basic information, which was a bit perplexing since it’s critical to demonstrating the effectiveness of the approach, even to yourself, while experimenting. The assumption is that you didn’t pay any attention to this and were only focused on making something work without regard to effectiveness.
Hype Is What Hype Is
Observation: LLMs were shoehorned into every use case.
With the level of hype around LLMs, it was inevitable that they would be shoehorned into every use case. This was even in cases where the problem itself wasn’t interesting or in cases where we already had solid solutions for the problem.
I think of this as experimentation and the natural result of a new technology’s introduction. Whenever a new technology comes along, people play around with it, try applying it to different use cases, and see what works. Nothing is wrong with this, but it’s time to get real when submitting to a conference.
This is where you need to refer to the previous section on success and benchmarking criteria to demonstrate the value of your submission. It’s okay to have a failed experiment or even subpar performance as long as there are takeaways and potential directions for others. Having a lessons-learned style of presentation can be helpful in certain circumstances. Just keep in mind, however, this is very situational.
If you are solving an already solved problem, you better bring it in some way and justify it with examples and success/failure criteria. Using a new technology to solve uninteresting or unimportant problems is also not a good recipe for success. Not every fun project makes a good conference submission.
CFP Submission Issues
Of course, every year, there is no shortage of regular old submission issues unrelated to AI. These are the easy things to avoid, yet people often don’t do them. I’ve got some updates to previous submission guidance I’ve given, and this isn’t the place for that, but I want to hit a couple of highlights for quick reference.
What’s unique about your talk? Ensure you’ve covered a unique angle or perspective your talk brings in the submission.
Would you sit through your own talk? This is a question almost nobody asks themselves, but it’s enlightening on multiple levels.
Think hard about your takeaways Your takeaways are the reasons people would attend your talk. Every reviewer has takeaways in the back of their mind when reviewing your submissions. Ensure these are covered in your submission, either spelled out in the appropriate section or painfully obvious from the submission.
Fill out the form completely Yes, this actually has to be said. You’d be surprised at the number of people who submit incomplete proposals every single year.
Get feedback Find someone who will give you honest feedback and share the submission with them ahead of time. Feedback is the best way to anticipate potential questions and ensure the concepts you think are clear are actually communicated clearly.
Preemptively answer questions You can find some of these questions when you ask for feedback, but put your reviewer cap on. Pretend you are reviewing your submission and see if any obvious questions emerge. Your submission should answer more questions than it poses.
Don’t Do This
Speaking of questions, don’t ask a series of questions in your submission. This isn’t a movie trailer; asking questions isn’t an opportunity to build suspense with reviewers. I don’t know if this is some new trend, but a few submissions did this, and it’s not a recipe for success.
I noticed a few submission bodies and outlines were peppered with questions. Examples such as, “Did our approach work?” “Is it possible to implement our approach in production?” You get the point. It’s one thing to have these questions in the abstract since that’s public and will be displayed on the website. It’s another thing to put it in the submission body where reviewers are trying to evaluate the validity of your submission.
Conclusion
My hope is that people find this post helpful and it points people in the right direction. Preparing a submission for a conference can be daunting, but with a bit of preparation and feedback, your submission will have a better chance of getting selected. I’m looking forward to reviewing your submission.
Again and again, we never learn seem to learn lessons. Approaching everything in the world as an optimization problem isn’t the best approach and can make things worse. Sure, some out there looked at The Matrix and relished the thought of living their lives in a simulation while submerging in a viscous liquid with tubes attached to them. Fortunately, that’s not an option, well… yet anyway. That leaves us in the real world trying our best to turn it into a simulation, and optimizing away our human interactions is one of the best ways to do that.
Relationships are work, and work is friction. Therefore, reducing relationships reduces friction. Boom, Optimized! It seems silly when phrased this way, but this is the approach we are using to address countless human interactions with tech, and we may not even realize it. When consumed by how cool a particular technology is, we tend to take the Maslow’s Hammer approach, and everything, including human interactions, becomes a nail.
Outsourcing Simulated Emotional Connections
Back in March, I wrote about this issue in a post called Outsourcing Simulated Emotional Connections to Bots. I wanted to revisit this topic now that some time has passed and we’ve made even more progress, and predictably, things have gotten worse.
Far too many people don’t see an issue with this and may want to replicate it, but even a cursory look at the article and its subject has a noticeable cringe factor. Sure, a problem is defined in that post, and that problem is YOU. It’s not a technical problem. You are the one who isn’t making time for your mom. You are the one going about your days for long periods, not even thinking about your mom. This isn’t a tech problem; it’s a YOU problem. It should make you feel bad, and that feeling is an indicator that you need to make a change. It’s your brain’s way of keeping you in check.
But even employing the tech doesn’t solve the problem because… you still didn’t think about your mom. She didn’t need to occupy any space in your brain. You’ve optimized. But why stop here? Why not clone your voice and, at regular intervals, have someone call your mom using your voice and have a conversation with her so you don’t have to? What a utopia. Then you’d never be inconvenienced by your mom. Technologically speaking, we aren’t far from having something like this be completely automated, so you wouldn’t even need to hire someone to use your voice. You could forget about your mom entirely.
On top of this, it’s incredibly deceptive. You are using technology to fool your loved one into believing they are on your mind. There’s an ethical problem with employing tech as a deception when dealing with humans, especially when those humans are your loved ones. Think about your mom’s reaction if she knew you were doing this.
Approaching this as an optimization problem means when your mom passes away, things get better.
You only have a limited amount of time with your mother, and before you know it, she’ll be gone. Approaching this situation as an optimization problem means things get better when your mom passes away, but we know this isn’t true.
Introducing ThereBot!
Warning: Future Advertisement Below
Having kids is a hassle. You spend so much time going from event to event, sporting events, band recitals, plays, this list goes on and on. What if there was a way to do what you wanted without having to be bogged down by pesky activities and your child’s emotional well-being? Well, now you can!
ThereBot Introducing ThereBot. ThereBot is an exciting new way for you to be there without having to be there! ThereBot uses an adaptive architecture to respond properly to your child’s activities. It’s quiet during recitals and cheers your child on during sporting events. If you decide to watch the event after the fact wink wink ThereBot has your back. Our cutting-edge algorithms cut out all the boring stuff, so you only get the highlights—hours of wasted time condensed into a few minutes. ThereBot pays for itself!
ThereBot+
But why stop there? ThereBot+ comes with an impressive array of upgrades, including a screen showing an image of you as though you are watching the game and the ability to clone and use your voice. This means you can shout, “Daddy loves you,” at any time like you were actually there. Here’s how to order!
Shame Isn’t An Effective Long-Term Control
In the short term, the thought of sending a robot instead of going yourself isn’t something many would do, not because they don’t want to, but because not only can your children observe your non-attendance, but others can also. So, the big catch in the short term is shame. We all know shame isn’t a long-term control. It starts by saying, “I’ll use it when I’m traveling and can’t attend,” or “I’m just too busy right now.” Plus, people can be shameless; the more shameless people there are around, the more that activity becomes normalized and contagious.
Dehumanizing Through Optimization
We are often distracted by how cool a particular new technology is and look to apply it to every use case we can. This is a sort of Shiny Object Syndrome applied to technology. We are more focused on what it does than what it does to us. This Maslow’s Hammer approach leads us to solutions in search of problems without understanding underlying issues. This gets far worse in social contexts.
The rise in self-centeredness and even narcissism is growing. Our modern, social media-driven world forces us into a cycle of constant self-promotion. I believe this pre-dates social media, though, and began with my generation raising children in the age of the self-esteem movement. A movement that many still exercise even though it’s been proven to be detrimental. For an entire exploration of this topic, I highly recommend Will Stor’s book Selfie: How We Became So Self-Obsessed and What It’s Doing to Us.
We already dehumanize others, treating them more like processes, checklists, or apps than other humans. This was something I mentioned in my previous post. We do this with everyone: shift workers, customer service representatives, Uber drivers, and even coworkers. Everyone seems to be an obstacle in getting what WE want. I’m certainly guilty of this myself, not considering the human on the other end of the phone or the person behind the counter when I’m having an issue.
We turn to technology in these cases to provide the optimization we need to reduce the friction of dealing with others. These others aren’t constrained to strangers and acquaintances. They are also friends and family.
These trends lead to a bunch of questions. Are humans evolving to be more self-centered? Will we stop caring about others in the future? Will we stop loving? I mean, what causes more friction than love? After all, love can make you feel worse than you’ve ever felt in your entire life. Will we stop even taking chances on love? Some people certainly have already. I don’t think this is a healthy trajectory.
Also, why even have friends? It seems like such a massive waste of time. You have to do things you don’t want to and potentially deal with problems other than your own. You’ve got your own problems to deal with. It’s one thing to think this, but saying it out loud is something else entirely. We are often confronted with our ridiculousness by saying things out loud. It’s something we should do far more often as a gut check.
There is more and more evidence that younger generations are forgoing friendship. One survey reported that 22% of Millennials say they have no friends at all. This isn’t constrained to Millennials. The numbers are down across multiple age groups, with people having fewer close friends with Gen Z even trying to spend money to make friends and, of course, turning to technology to solve their friendship woes. Social Media has certainly accelerated this by making things superficial and fake. And, of course, the global pandemic right in the middle of all of this pushing the accelerator to the floor.
Humans evolving into machines instead of machines into humans is something that doesn’t get enough attention.
Friction is Currency
Not all friction is bad. In some cases, the friction is the point of the task. But regarding human interactions, here’s a thought: friction is the currency that pays for fulfillment. Looking at a potential friendship and asking, “What’s in it for me?” is the wrong question with a wrong answer. Unfortunately, far too many people have this perspective. Even if you had incredibly selfish motives, you may not know what’s in a friendship until it bears fruit, which may not be evident until later.
Friction is the currency that pays for fulfillment.
Friendships are valuable simply by being. It’s hard to describe, kind of like love. It’s like the old trick question someone asks, “What do you love about me?” It’s not so easy to summarize. You just kind of know it, and you are better off for having it.
Coworkers
The workplace is where people justify classifying their coworkers as tasks or obstacles. This certainly isn’t new, but it’s an area that people love to talk about optimizing with tech. Even some chatbot demos speak about how great it would be if you didn’t have to be bothered by your inbox at work, but even your coworkers shouldn’t be treated like apps just because they may not be your friends. Relationship building at work is essential for many reasons, but in an age of diminishing jobs, relationship building may be the best way to save yourself when the cutbacks happen.
Collaboration itself appears inefficient because it’s just easier to do something yourself. But once again, friction is currency. Anyone who’s ever written music or been in a band knows how frustrating it can be to collaborate with other strong personalities. However, when you realize that the different perspectives elevate a song to a level it wouldn’t have achieved on its own, the insight is incredibly enlightening and makes you appreciate other’s input. This is the same at the workplace.
In relationships, like so many other activities, the friction is the point.
The Coming Chatbot Hangover
We haven’t yet hit the hangover stage. We are still at the bar, slurring our speech while we make the most insightful point in the history of human civilization, but it’s coming. I wrote about this in the Social Impacts section of my Post-Black Hat USA and DEF CON AI Thoughts post. We are about to enter an era of historical figures, celebrities, and persona-based chatbots, all to increase engagement on particular platforms. These systems will boast massive numbers after launch as people check it out, followed by a very steep drop-off as the novelty wears off and the superficial and fake nature of the interaction sets in.
At least when we play a video game, we realize that NPCs aren’t human. What we are doing is trying to say that the bot is a representation of a specific human, which it is not. Subconsciously, we know this, and after the initial euphoria wears off, reality sets in, and the whole concept seems cheap and manipulative. Remember, this is far different than an algorithm working behind the scenes. Bots are directly in front of people and interacting with them.
Conclusion
Removing the smoke detectors in your house is a great way not to hear the smoke detector go off every time you cook, but obviously, this isn’t solving the real problem.
We don’t realize we may be causing other effects and problems when we focus only on the technology and its cool factor. We may be fooled into thinking that friction is the problem when it may be the point or an indicator. Removing the smoke detectors in your house is a great way not to hear the smoke detector go off every time you cook, but obviously, this isn’t solving the real problem. Friction and discomfort in human interactions can be like a smoke detector, a leading indicator that something else needs to be addressed. So, call your mom today. I know I will.
We are about to be inundated with stories of misinformation and deepfakes, all focused on the 2024 US election. I know the last thing most people in the United States want to consider is the 2024 election. Election cycles are tiring, but even before we get into full swing, there are already grumblings about AI. I mean, why wouldn’t there be? It’s been all AI all the time. Generative AI is here, in case that’s something you’ve somehow failed to notice. Methods for generating text and images keep getting better and better, and they are far more accessible than they’ve ever been.
I’ve pulled no punches that I think the capabilities of LLMs are overhyped, but they excel in the areas useful for generating misinformation. I’ve even said that this would be the year that generative AI starts replacing jobs, something that appears to be already happening. So, with a looming election, highly capable systems, and low cost of generation, what effect will generative AI have on the 2024 US Election?
So here’s my claim: Misinformation and Deepfakes won’t affect the outcome of the 2024 US election. More accurately, it will have a “statistically insignificant” effect on the 2024 US election.
Note: For this post, I’m using the term misinformation to cover instances of misinformation and disinformation.
Generative AI and Wide Availability
Due to the recent boom of generative AI, the 2024 US election will be the first major US election where these tools are widely accessible. This accessibility extends to everyone involved, including campaigns, nation-states, malicious actors, and even the general public.
To take accessibility a step further, this can be done very cheaply. People don’t have to use the models hosted by providers like OpenAI, Stability AI, Midjourney, etc. Models for generating text, images, and audio can be run on consumer machines or at least machines that aren’t much bigger than consumer machines. These models are also available without the typical guardrails. With all of this availability and ease of access, that begs the question, won’t this lead to a misinformation apocalypse?
2024 Misinformation Apocalypse? Not So Fast
Misinformation in the context of generative AI means the purposeful manufacturing of false information in photo, video, text, or audio formats with a particular goal. This content is then used to serve a message around events and activities that either didn’t happen or reframe events that happened differently. I refer to this as “narrative evidence,” I wrote about this back in 2020. You are manufacturing false content as evidence to support a larger narrative. This narrative is meant to support a position or demonize someone else but with a goal in the case of an election. Fortunately for us, this condition only remains highly effective when the novelty factor is high, and this novelty factor is dropping quickly.
In the context of an election, misinformation is meant to sway opinion and affect voters. For example, this example of ludicrous claims that high-profile figures in the Democratic Party are actually on house arrest, with the associated and laughable proof. No AI is necessary in this case. Spreading content like this is meant to convince people that voting for people in the Democratic Party is a bad idea and they should vote the other way (or stay home), but it doesn’t work that way in practice.
Misinformation at scale has both logistical and social challenges, so let’s look at the Generative Misinformation Cycle.
Generative Misinformation Cycle
Let’s break down the generative misinformation cycle into a few different steps. Breaking this down into several steps helps to highlight what’s easy and what really matters.
Generation – This step is the creation of the content. This step is easy and mostly friction-free, even without generative AI. What Generative AI brings to the table is an increase in velocity, not precision. So you can generate misinformation much faster and create more volume, but there’s no guarantee that misinformation will be better, and quite often, it can be worse than human-generated misinformation. For example, try getting an LLM to explain why the Distracted Boyfriend meme caught on. I mean, it’s difficult for humans to explain why certain things catch on as well.
There are quite a few cultural movements to latch on to that LLMs don’t understand, but there’s no doubt you can create massive amounts of content with generative AI. Sure, once a cultural movement has been identified, a bad actor can then try to latch on to it by automatically generating misinformation, but this slows down the process and is less effective.
Amplification – A piece of misinformation does no good if nobody sees it. Amplification is getting that content in front of the eyes of as many people as possible. Preferably the people who’d most likely engage with it since more engagement leads to more amplification. You’ll also increase the potential success of the intended outcome of the misinformation.
When it comes to amplification, it’s not as hard to amplify as some would have you believe. Nation-states have an army of people that amplify content. If you can hit the right chord aligning with people’s biases, they’ll amplify the content.
Engagement – Engagement is getting people to interact with the content. This could be in liking, sharing, or even commenting on it. The more engagement, the more false consensus is built around the content. This engagement can feed back into the amplification phase through algorithmic amplification on social media or merely exposing others to the content. It would be a mistake to assume that engagement leads to an outcome. People share things they don’t read all of the time because the title agrees with their biases.
Outcome – This is the action the misinformation is intended to have. This may increase votes for a party or candidate or get people to believe something. This is where misinformation really matters. It’s not so cut and dry as a call to action, but it could be a change of mind on a topic.
For any piece of misinformation to be effective, there needs to be a successful outcome. This is much harder than it seems. Amplifying and increasing engagement seems like the goal, but it’s not. Many people discussing AI-generated misinformation talk about how well it can structure articles and provide references. But we know that many sharing content don’t read the content they share.
Mental Cement
People have made politics (and many other things) religions now. We’ve had a pandemic and lockdowns for people to spend an inordinate amount of time online and cement their biases. Every bit of content we encounter, we apply our biases to it. If it’s something we like, we assume it’s true. If it’s something we don’t like, it must be a deepfake. I mentioned the concept of claiming deepfakes in my 2020 post, and it seems even Elon Musk has made this a reality.
Almost no amount of misinformation will get people to change their minds about something they believe in. It’s why it’s so hard to get people out of cults, change religions, or even political parties.
Getting people to change these fundamental things after cements takes a massive effort. My dad was one of the few who did change religions, but only because of my mom. People occasionally also switch political parties, but it’s also rare. It’s much more likely to have people become unaffiliated. People don’t switch religions; they leave religions. People don’t switch political parties; they become independent. This may be a silver lining when it comes to misinformation. I’ll get to this later.
Convincing someone to believe in misinformation only works if you have two fundamental aspects. A non-politically charged topic and something that doesn’t go against the strong biases of the person encountering the content.
Convincing someone to believe in misinformation only works if you have two fundamental aspects. A non-politically charged topic and something that doesn’t go against the strong biases of the person encountering the content. It’s certainly not impossible, but the climb is significant.
Instances Don’t Equal Impact
You’ll see the press and pundits point out instances of misinformation as proof that it’s having an effect. This isn’t the case. We’ll most certainly see more content, AI-generated or otherwise, focused on the 2024 election. An Increase in content doesn’t equal an increase in influence or effects on a significant scale. This would be the “Outcome” step in the Generative Misinformation Cycle.
In the context of the election, misinformation, and deepfakes will not be used to change people’s minds but to excite the base and poke fun at the opposite candidate. In 2024, people will wage meme warfare, and generative image models will be their weapons.
CounterCloud
CounterCloud is an experiment in fully autonomous disinformation, and it’s terrifying to some people.
It’s a neat experiment in what’s possible, and the approach is interesting for creating counter-narratives. You can read more about it here. However, once again, this overlooks the fact that many people don’t read the articles. They share based on the headlines. It also has other more fatal flaws, such as it works to drive people to a single site, even though it can use social media to drive attention there. Ultimately, this would be identified pretty quickly. And yes, lessons learned here could be more stealthy, but we still have the same issues I covered in this post.
But, Deepfakes Tho
Nowhere does the misinformation become spicier than the arguments about deepfakes. When I relaunched this blog back in 2020, the topic of Deepfakes was the first I tackled. I mostly focused on how their threats weren’t appropriately phrased and overhyped. Imagine that. I felt the real legacy of deepfakes lies in their ability to harass versus their convincing people that something happened. I still feel this way. Fooling people only works while the novelty factor is high, then there is a steep drop off.
Let’s look at Pope in a Puffer Jacket, also known as Balenciaga Pope. I know this image fooled many people, which seems to go against my point in the post, but not so fast.
The Pope in a puffer jacket image fooled people because nobody cared about the Pope or his jacket. If this were a politically charged topic or a topic that people were highly biased toward, it would have received much more scrutiny.
Meme Wars
Generative AI will most likely be used to create memes and caricatures during the election cycle. This won’t all be malicious. Some of it will be downright hilarious (depending on which side of the political spectrum you are on), such as the images created of RuPublicans.
Although some memes and content will be good fun, much of it will be malicious. If generative image tools restrict the ability to generate political figures, then that could slow down this meme war a bit, but some of these models are open source and could be run on systems without these guardrails. So, we’ll see as soon as the election cycle starts heating up.
Misinformation and Deepfakes: Still a Problem
Just because I don’t think misinformation and deepfakes will affect the 2024 US election and don’t always work in high-stakes situations doesn’t mean I don’t think these are a problem. In my previous post, I wrote that I felt the real legacy of deepfakes would be in their use in harassment. So, activities like mocking people or creating non-consensual porn are two examples of this.
Also, there are so many non-politically charged situations where it’s easy to fool people. Where the stakes are low, nonsense will proliferate. Just like Ted Cruz recently fell for the old shark in a waterway hoax.
This does bring up another issue, and that is we are creating an internet of junk. Even if it’s not malicious or directly harmful to anyone, it still has the potential to affect people. There are some fundamental issues in creating a world where you never really know if any content you encounter is real or not. This is really the near future we are headed for. I need to give this some more thought to consider the full impacts at scale.
There are some fundamental issues in creating a world where you never really know if any content you encounter is real or not.
A Silver Lining
Will the deluge of nonsense have a positive effect? It’s possible. Consuming misinformation and other nonsense is consuming mental junk food. It feels good, but there’s no substance. Just like eating cake and ice cream for every meal seems fun, it’s not fun in practice.
When you are bombarded with things, you tend to check out. The mental junk food becomes less fun, and you stop interacting with it, possibly block it, or just leave social media for a while. So, it could have a positive impact. I realize I may be too hopeful, but it’s possible. I’m also aware of the arguments that say making people tune out is the point, but even given their argument, I don’t think it’s all bad on that side.
This is also precisely why legitimate news outlets shouldn’t use Generative AI to curate and write articles. This makes these news sources seem like part of the problem when the rest of the internet is filled with nonsense. The stakes are too high, and the value too low.
Conclusion
This post contained some food for thought, possibly going in the opposite direction of what may be reported. I could be completely wrong about all of this, and the tide of the election could very well turn based on AI-generated misinformation, but I don’t think so. Usually, I’d be happy to be wrong, but not in this case for obvious reasons.
There isn’t much we can do for the time being except employ critical thinking skills and evaluate content accordingly. The hype of 2024 is right around the corner. I do feel there are a couple of fundamental things we can be doing to prepare for a world in which reality is merely a suggestion. This involves teaching data literacy as well as probability and statistics in the K-12 curriculum. Making room for these subjects is vital to prepare students for not just the future but what we now have in the present.
Wow, another Black Hat USA and DEF CON are in the books, and it was great seeing everyone. One of the best parts of conferences is the conversations, and those conversations were amazing. As you can imagine, many of them were about “AI.” Since there were no cameras in the AI Security Challenges, Solutions, and Open Problems meetup and it will be a while before the Forward Focus: Perspectives on AI, Hype, and Security presentation makes its way online, I thought I’d summarize a few points as well as distill some of my perspectives on the topics I covered and conversations I had, now that I’ve had a few days to reflect.
Perspective on LLM Impacts
I deal with so many people making nonsensical or unfounded claims that I wanted to make it clear where I stand on the subject of LLMs and their impact on humanity. When you live in reality, you tend to be labeled a hater.
I’m not big on making predictions, but let me say this with a fair amount of confidence, LLMs will not be more impactful on humanity than the printing press, and GPT-5 won’t achieve AGI. Those of you who know me will find the fact that I’m in the middle unsurprising, but hey, the only technology I hate is PHP 😉
All AI All The Time
As was expected, everything was all AI all the time. Every vendor booth had the term “AI.” AI-powered products, AI pen testing, AI assurance, AI, AI, AI! Everyone is ALL in. Even though I expected it, being confronted with the term absolutely everywhere was still shocking. What we’d poked fun at in the past has become our reality. Everyone is trying to ride the wave to success, regardless of their skills or capability. It would be easy to blame this on marketing departments, but it was far more than that.
All references to machine learning seemed to be scrubbed in favor of using the term “AI.” Seems machine learning is having its “cyber” or “crypto” terminology moment. I learned long ago that fighting the industry over terminology is a losing battle, so yes, I’m giving in to the massive, crushing weight of hype, and I’ll move the battlefront to somewhere else.
Losing the terminology battle isn’t without drawbacks.
Still, losing the terminology battle isn’t without drawbacks. It seems many are also using the term AI synonymously with generative language models, which just muddies the water more. When you mention that you think the capabilities of LLMs are overhyped (i.e., not going to be more impactful than the printing press, etc.), people tend to throw out things like drug discovery or AlphaFold. When you point out that those are different approaches and it’s not like ChatGPT is doing that, they tend to still cling to adjacent success in specific domains as an indicator of success here. It’s like being in a VW Bug and pointing out that a Ferrari can do over 200 mph.
This is also a shame since many more traditional machine learning approaches aren’t even considered as people rush to LLMs, even approaches that are more reliable and proven for specific security problems. I think this will level out at some point, but not anytime soon. Time to put LLMs on the moon!
Where People Stand
The consensus from many I talked to is that they were just trying to figure out where they stood. They’ve heard so many outrageous claims, and the reporting on advancements has been so all over the place. On the one hand, you have people claiming GPT-5 is going to be AGI; on the other, you have people advocating military strikes against data centers. It’s no wonder people are confused.
Given the wild reporting, outrageous claims, and AI hustle bros trying to get you to subscribe to their channels, I was surprised that most people were pretty grounded. Many didn’t think AI would take their job or that the ChatGPT Plugin Store would be more impactful than the mobile App Store on humanity. I found this incredibly refreshing.
I suggested to the people I talked to that whenever you hear someone spouting outrageous claims, ask them why they think that. People making outrageous claims about LLMs often try to drive attention into their funnel. They want people subscribing to their Substack, YouTube, Mailing lists, etc. They can make these claims and never have to justify them, never have to give examples or show real-world impact. The rest of us have to live in a reality where our software has to work, scale, and be reliable. So, beware of people making claims without providing specific examples. Also, stories in the news often don’t reflect realities on the ground.
Fooling Ourselves Is Easy
The social contagion status of ChatGPT highlighted a vulnerability in humans, and that’s that we are very bad at creating tests and very good at filling in the blanks. The world is filled with experiments, and highly-cherry picked examples. We tend to see a future that isn’t there. We often forget that the world is filled with edge cases, which confuse many of these AI systems.
The social contagion status of ChatGPT highlighted a vulnerability in humans, and that’s that we are very bad at creating tests and very good at filling in the blanks.
Look at self-driving cars, for instance. We see a demo of a self-driving car properly navigating the roadway, and we assume that truck driving as a profession is doomed almost immediately. It seems like one of the easier problems, stay in the lane, obey the signs, and don’t hit things. Boom! But anyone who’s driven a car knows that edge cases are everywhere. Road construction, lighting conditions, snow, accidents, etc. Humans handle these conditions pretty well, by contrast.
Supercharged Attackers
LLMs won’t supercharge inexperienced attackers
One point I brought up in the meetup and during our panel, was that people made similar claims about Metasploit supercharging inexperienced attackers when it was launched over twenty years ago. People made claims that Metasploit was like giving nukes to script kiddies. Those comments didn’t age well, and I think the same is true about LLMs. You still have to know what you are doing when using LLMs to attack something. It’s not like point, click, own. Also, it’s not like LLMs are finding 0day or writing undetectable malware. I know. I’ve seen the research and reports. Neat research, but it’s not like it’s overly practical for attacks at scale.
People made claims that Metasploit was like giving nukes to script kiddies
Today, most malicious toolkits you hear about, like FraudGPT, WormGPT, and many others that have popped up, are primarily tools for phishing and social engineering attacks (despite having “worm” in the title.) This can certainly have an impact, but not on the apocalyptic levels that some would have you believe. All of this technology is indeed dual use, so something that’s helpful for security professionals will also be helpful for criminals. Just like we have people hyping AI on the clear web, you have people hyping AI on the dark web.
Losing Your Job To AI
Most people I talked to didn’t seem overly concerned about losing their job to AI, but I got the feeling that it was in people’s minds regardless. The recent sting of many layoffs is probably not helping the uncertainty. This was one of the points we tried to address from the stage at Black Hat. I used the example of AlphaGo. I asked the audience how many people had heard of AlphaGo beating Lee Sedol at Go. I was surprised that very few hands in the audience went up since it was big news at the time. I then asked how many people had heard of the research from Stewart Russell’s lab that allowed even average Go players to beat these superhuman Go AIs. No hands went up.
My point was that there is a lesson here for security professionals. These new technologies tend to have their own vulnerabilities and issues that also need to be addressed. In addition, all of these technologies have gaps, and the gaps will need to be filled. So, for the foreseeable future, your job is safe in the context of information security. We’d have a much different conversation if you were a freelance graphic artist.
Misinformation and Deepfakes
I was a bit surprised by the fact I didn’t hear any conversations about misinformation and deepfakes. I’m sure they happened, but not at any of the events or conversations I participated in. The only time it was brought up, it was brought up by myself in conversation. I have a rather spicy take on the 2024 US Election. I think misinformation and deepfakes will have a statistically insignificant effect on the 2024 election. I will address this in a future blog post, but in summary, people have already made up their minds and cemented their biases.
It’s not that these issues aren’t important or impactful, just in context, not significant. I wrote about this topic back in 2020 when I relaunched my blog. Interestingly, in that post, I also mentioned the people who should be most concerned about the technology powering deepfakes: actors and actresses. Very relevant now with the SAG AFTRA strike and AI being a big concern.
Social Impacts
There were virtually no conversations about the social impacts of Generative AI other than the conversations I initiated. This isn’t surprising since it’s a large focus of my blog, and I spend a lot of time thinking about these topics. Seems most people were focused on use cases and capabilities. My fellow tech people are often optimizers and look to optimize everything. They don’t realize that friction is the point in certain cases.
I think the chatbotification of everything is something humans are starting to tire of.
I think the chatbotification of everything is something humans are starting to tire of. When someone launches a new service, you have this quick uptake due to the novelty factor, followed by a steep drop-off. We are about to enter an era of celebrity and historical figure chatbots, I think the same curve applies.
We’ll see lots of press, rapid adoption, followed by a steep drop-off. This could be due to boredom, lack of true functionality, or even something more primal, which is the sort of “fake factor” of it all. We know we aren’t actually talking with Harriet Tubman when we use the chatbot. What seems kind of fun at first starts to take on a tarnish very quickly. As tech people, we get so caught up in the cool factor of the technology we build that we tend to forget the human factor in all of this. I think I’m on the right track here, but I realize I’m also old and have never played Minecraft, so I could be wrong.
Customer support chatbots, the ones that are directly customer-facing, have some promise, but only if they are empowered to take the action necessary to resolve the issues that customers are having. On the flip side, having an empowered chatbot also opens the door to manipulation. So this, too, has issues. My gut tells me that as organizations launch empowered bots for various things, there will be subreddits dedicated to manipulating them. This manipulation could be for fun, getting discounts, or stealing services. Time will tell.
There’s certainly some promise in hybrid workflows pairing humans and bots together, where the human is actually the one in first-party contact with the customer. This may be the ultimate path, but something tells me the replacement path will start first, and hybrid will be the fallback.
Prepare To Be Surprised
In my closing statement at Black Hat, I mainly told people to prepare to be surprised. There are lots of experiments and money pouring into the space. Anyone who thinks they have they can see the future here would be fooling themselves. The whole thing is simultaneously exciting and scary. The best thing people can do is remain grounded but also play with the technology. Don’t sit on the sidelines, generative models are pretty accessible. Play around and apply it to some of your use cases. Above all, have fun.
If we are not careful, we are about to enter an era of software development, where we replace known, reliable methods with less reliable probabilistic ones. Where methods such as prompting a model, even with context, can still lead to fragility causing unexpected and unreliable outputs. Where lack of visibility means you never really know why you receive the results you receive, and making requests over and over again becomes the norm. If we continue down this path, we are headed into a brave new world of degraded performance.
Scope
Before we begin, let’s set the perspective for this post. The generative AI I’m covering in this post is related to Large Language Models (LLMs) and not other types of generative AI. This post focuses on building software meant to be consumed by others. Products and applications deployed throughout an organization or to delivered to customers. I’m not referring to experiments, one-off tools, or prototypes. Although, buggy prototype code can have an odd habit of showing up in production because a function or feature just worked.
This post isn’t about AI destroying the world or people dying. It’s about the regular applications we use, even in a mundane context, just not being as good. The cost of failure doesn’t have to be high for the points in this post to apply. I’m saying this because, in many cases, the cost may be low. People probably won’t die if your ad-laden personalized horoscope application fails occasionally. But that doesn’t mean users won’t notice, and there won’t be impacts.
Our modern world runs on software, and we are training people that buggy software should be expected.
Our modern world runs on software, and we are training people that buggy software should be expected, and making requests repeatedly is the norm, setting the expectation that this is just the price paid in modern software development. This approach is bad, and the velocity at all costs mantra is misguided.
Let me be clear because I’m sure this will come up. I’m not anti-AI or anti-LLM or anything of the sort. These tools have their uses and can be incredibly beneficial in certain use cases. There are also some promising areas, such as the ability of LLMs to, generate, read and understand code and what that means for software development in the coming years. It’s still early. So in no way am I claiming that LLMs are useless. I’m trying to address the hype, staying in the realm of reality and not fantasy. The truth today is that maximizing these tools for functionality instead of being choosy is the problem and there are costs associated.
Software Development
Software development has never been perfect. It’s always been peppered with foot guns and other gotchas, be it performance or security issues, but what it lacked elegance, it made up in visibility and predictability. Developers had a level of proficiency with the code they wrote and an understanding of how the various components worked together to create a cohesive service, but this is changing.
Now, you can make a bunch of requests to a large language model and let it figure it out for you. No need to write the logic, perform data transformations, or format the output. You can have a conversation with your application before having it do something and assume the application understands when it gives you the output. What a time to be alive!
There’s no doubt that tools like ChatGPT increased accessibility to people who’ve never written code before. Mountains of people are creating content showing, “Look, Mom, I wrote some code,” bragging that they didn’t know what they were doing. I’ve seen videos of University Professors making the same claims. This has and will continue to lead to many misunderstandings about problems people are trying to solve and the data they are trying to analyze. Lack of domain expertise and lack of functional knowledge about how systems work is a major problem but not the focus of this post.
As a security professional, inexperienced people spreading buggy code makes me cringe (look at the Web3 space for examples), but It’s not all bad. In some ways, this accessibility is a benefit and may lead to people discovering new careers and gaining new opportunities. Also, small experiments, exploration, or playing around with the tools are absolutely fine. It’s how you discover new things. However, inefficiencies, errors, and lack of reliability aren’t dealbreakers in these cases. But what happens when this mindset is taken to heart and industrialized into applications and products that impact business processes and customers?
Degraded Performance
There’s a new approach in town. You no longer have to collect data, ensure it’s labeled properly, train a model, perform evaluations, and repeat. Now, in hours, you can throw both apps and caution to the wind as you deploy into production!
This above is a process outlined by Andrew Ng in his newsletter and parroted by countless content creators and AI hustle bros. It’s the kind of message you’d expect to resonate, I mean, who wouldn’t like to save months with the added benefit of removing a whole mountain of effort in the process? But, as with crypto bros and their Lambos, if it sounds too good to be true, it probably is.
Let’s look at a few facts. Compared to more traditional approaches:
LLMs are slow
LLMs are inefficient
LLMs are expensive ($)
LLMs have reliability issues
LLMs are finicky
LLMs can and do change (Instability)
LLMs lack visibility
Benchmarking? Measuring performance?
Pump the Brakes
Traditional machine learning approaches can have much better visibility into the entire end-to-end process. This visibility can even include how a decision or prediction was made. They can also be better approaches for specific problems in particular domains. These approaches also make it far easier to benchmark, create ensembles, perform cross-validation, and measure performance and accuracy. Everyone hates data wrangling, but you learn something about your data, given all that wrangling. This familiarity helps you identify when things aren’t right. Having visibility into the entire process means you can also identify potential issues like target leakage or when a model might give you the right answer but for the wrong reasons, helping avoid a catastrophe down the road.
The friction in more traditional machine learning is a feature, not a bug, making it much easier to spot potential issues and create more reliable systems.
The friction in more traditional machine learning is a feature, not a bug
Lazy Engineering
On the surface, letting an LLM figure everything out may seem easier. After all, Andrew Ng claims something similar. In his first course on Deeplearning.ai ChatGPT Prompt Engineering for Developers He mentions using LLMs to format your data as well as using triple backticks to avoid prompt injection attacks. Even the popular LangChain library instructs the LLM to format data in the same way. Countless others are creating similar tutorials flooding the web parroting this point. Andrew is a highly influential person who’s helped countless people with this training by making machine learning more accessible. With so many people telling others what they want to hear, as well as the accessibility of tools like LangChain, this will have an impact, and it’s not all positive.
One of the goals of software engineering should be to minimize the number of potential issues and unexpected behaviors an application exhibits when deployed in a production environment. Treating LLMs as some sort of all-capable oracle is a good way to get into trouble. This is for two primary reasons, lack of visibility and reliability.
Black Boxes
A big criticism of deep learning approaches has been their lack of transparency and visibility. Many tools have been developed to try and add some visibility to these approaches, but when maximized in an application, LLMs are a step backward. A major step backward if you count things like OpenAI’s Code Interpreter.
The more of your application’s functionality you outsource to an LLM, the less visibility you have into the process. This can make tracking down issues in your applications when they occur almost impossible. And when you can track problems down, assuming you can fix them, there will be no guarantee that they stay fixed. Squashing bugs in LLM-powered applications isn’t as simple as patching some buggy code.
Right, Probably
LLMs are being touted as a way to take on more and more functionality in the software being built, giving them an outsized role in an application’s architecture. Any time you replace a more reliable deterministic method with a probabilistic one, you may get the right answer much of the time, but there’s no guarantee you will. This means you could have intermittent failures that impact your application. In more extreme cases, these failures can cascade through a system affecting the functionality of other downstream components.
For example, anyone who has ever asked an LLM to return a single-word result will know that sometimes it doesn’t, and there’s no rhyme or reason why. It’s one of the classic blunders of LLMs.
So, you may construct a prompt stating only to return a single word, True or False, based on some request. Occasionally, without warning and even with the temperature set to 0, it will return something like the following:
The result is True
Not the end of the world, but now translate this seemingly insignificant quirk into something more impactful. Your application expected a result from an LLM formatted in a certain way. Let’s say you wanted the result formatted in JSON. Now, your application receives a result that isn’t JSON or maybe not properly formatted JSON, creating an unexpected condition in your application.
Suppose we combine this reliability issue with the lack of visibility. In that case, it can lead to some serious issues that may be intermittent, hard to troubleshoot, and almost impossible to fix without reengineering. In a more complex example, maybe you’ve sent a bunch of data to an LLM and asked it to perform a series of actions, some including math or counting, and return a result in a particular format. A whole mess of potential problems could result from this, all of which are outside your control and visibility.
Not to mention a big point many gloss over, deploying your application in production isn’t the end of your development journey. It may be the beginning. This means you will need to perform maintenance, troubleshooting, and improvements over time. All things LLMs can make much more difficult when functionality is maximized.
To summarize, outsourcing more and more application functionality to an LLM means that your application becomes less modular and more prone to unexpected errors and failures. These are issues that Matthew Honnibal also covers in his great article titled Against LLM Maximalism.
The Slow and Inefficient Slide
In some use cases, it may not matter if it takes seconds to return a result, but for many, this is unacceptable. Having multiple round trips and sending the same data back and forth may be necessary due to different use cases because a character changed or because of context window size, which also adds to the inefficiency. Even if the use case isn’t critical and inefficiencies can be tolerated, that’s not the end of the story.
There are still environmental impacts due to this inefficiency. It requires much more energy consumption to have an LLM perform tasks than more traditional methods. For example, searching for a condition with a RegEx vs. sending large chunks of data to an LLM and letting the LLM try and figure it out. The people ranting and raving constantly about the environmental impacts of PoW cryptocurrency mining are incredibly silent on the energy consumption of AI, even as former crypto miners turn their rigs toward AI. Think about that next time you want to replace a method like grep with ChatGPT or generate a continuous stream of cat photos with pizzas on their head.
LLMs Change and So Do You
Any check of social media will show that at the time of this writing, there have been quite a few people claiming that GPT-4 is getting worse. There’s also a paper that explores this.
There’s some debate over the paper and some of the tests chosen, but for the context we are discussing in this post, the why an LLM might change isn’t relevant. Whether changes are because of cost savings, issues with fine-tuning, upgrades, or some other factor aren’t relevant when you count on these technologies inside your application. This means your application’s performance can worsen for the same problems, and there isn’t much you can do about it but hope if you are consuming a provider’s model (OpenAI, Google, Microsoft, etc.) This can also lead to instability due to the provider requiring an upgrade to a newer version of the hosted model, which may lead to degraded performance in your application.
Demo Extrapolation
The problem is that none of the constraints and issues may surface for demos and cherry-picked examples. Actually, the results can look positive. Positive results in demos are a danger in and of themselves since this apparent working can mask larger issues in real-world scenarios. The world is filled with edge cases, and you may be running up a whole bunch of technical debt.
Hypetomisim and Sunken Cost
There’s a sense that technology and approaches always get better. Whether this is from Sci-fi movies or just because people get a new iPhone every year, maybe a combination of both. Approaches can be highly problem or domain-specific and not generalize to other problem areas or at least not generalize well. We don’t have an all-powerful single AI approach to everything. Almost nobody today would allow an LLM to drive their car. However, some have hooked them up to their bank accounts. Yikes!
But you can detect an underlying sense of give it time in people’s discussions on this topic. Whenever you point out issues you usually get, well GPT-5 is gonna… This goes without saying that ChatGPT is based on a large language model, and large language models are trained on what people write, not even what they actually think in certain cases. They perform best on generative tasks. On the other hand, tasks like operating a car have nothing to do with language. Sure, you could tell the car a destination, but every other operation has nothing to do with language. It’s true that LLMs can also generate code, but do you want your car to generate and compile code while driving it? Let me answer that. Hell no. Heed my words, maybe not this use case, but something in the same order of stupid is coming.
Developing buggy software in the hopes that improvements are on the way and outside your control is not a great strategy for reliable software development.
Developing buggy software in the hopes that improvements are on the way and outside your control is not a great strategy for reliable software development. I’ve heard multiple stories from dev teams that they continue to run buggy code with LLM functionality and make excuses for apparent failures because of sunken costs.
The hype has led to a new form of software development that appears to be more like casting a spell than developing software. The AI hustle bros want you to believe everything is so simple and money is just around the corner.
Now’s a good time to remind everyone that fantasy sells far better than reality. Lord of the Rings will always sell more books than one titled Eat Your Vegetables. Trust me, as most of my posts are along the lines of Eat Your Vegetables posts, I make no illusions that every AI hustler’s Substack making nonsensical and unfounded predictions is absolutely crushing me in page views.
Engineering Amnesia
In a development context, we may forget that better methods exist or allow ourselves to reintroduce known issues that cause cascading failures and catastrophic impacts on our applications. This isn’t without precedent.
The LAND attack came back in Windows XP after it was known and already mitigated in previous Windows OSs. ChatGPT plugins are allowed to execute in the context of each other’s current domains, even though we’ve seen time and time again how this violates security. The Corrupted Blood episode was a failure to understand how the containment of a feature could cause catastrophic damage to an application, so much so that it forced a reset. And, of course, don’t even get me started on the Web3 space. I mean, who wouldn’t want tons of newly minted developers creating high-risk financial products without knowledge of known security issues? It was fascinating to see security issues in high-impact products for which standard, boring, and known security controls would have prevented them. These are just a couple off the top of my head, and there are many more.
As new developers learn to use LLMs to perform common tasks for which we have better, more reliable methods, they may never become aware of these methods because their method just kind of works.
Avoiding Issues
The perplexing part of all of this is that these issues are pretty easy to avoid, mainly by thinking carefully about your application’s architecture and the features and components you are building. Let me also state that these issues won’t be solved by writing better prompts.
Reliability and visibility issues won’t be solved by writing better prompts
There’s the perception that using an LLM to figure everything out is easier than other methods. On the surface, it may appear that there’s some truth to that. It’s also easier to spend money on a credit card than to make the money to pay the bill. So, it’s the case that you may be kicking the can down the road. Avoiding these issues isn’t hard, and a bit of thought about your application and its features will go a long way.
Look at your application’s features. Break these features down into functional modules. The goal of breaking down these features into smaller components is to evaluate the intended functionality to determine the best approach for the given feature. At a high level, you could ask a few questions with the goal of determining the right tool for the processing task.
Does the function require a generative approach?
Are there existing, more reliable methods to solve the problem?
How was the problem solved before generative AI? (Potential focusing question if necessary)
Is there a specific right or wrong answer to the problem?
What happens if the component fails?
These questions are far from all-encompassing, but they are meant to be simple and provide some focus on individual component functionality and the use case. After all, LLMs are a form of generative AI, and therefore, they are best suited to generative tasks. Asking if there’s a specific right or wrong answer is meant to focus on the output of the function and consider if a supervised learning approach may be a better fit for the problem.
We have reliable ways of formatting data, so it’s perplexing to see people using LLMs to perform data formatting and transformations, especially since you’ll have to perform those transformations every time you call the LLM. Asking these questions can help avoid issues where improperly formatted data can cause a cascading issue.
Example
Let’s take a simple example. You want a system that parses a stream of text content looking for mentions of your company. If your company is mentioned, you want to evaluate the sentiment around the mention of your company. Based on that sentiment, you’d like to write some text addressing the comment and post that back to the system. We break this down into the following tasks below.
For parsing, analysis, and text generation steps, it would be tempting to collapse all of them together and send them to an LLM for processing and output. This would be maximizing the LLM functionality in your application. You could technically construct a prompt with context to try and perform these three activities in a single shot. That would look like the following example.
In this case, you have multiple points of failure that could easily be avoided. You’d also be sending a lot of potentially unnecessary data to the LLM in the parsing stage since all data, regardless of whether the company was mentioned, would be sent to the LLM. This can substantially increase costs and increase network traffic, assuming this was a hosted LLM.
You are also counting on the LLM to parse the content given properly, then properly analyze and then, based on the two previous steps, properly generate the output. All of these functions happen outside of your visibility, and when failures happen, they can be impossible to troubleshoot.
So, let’s apply the questions mentioned in the post to this functionality.
Parsing
Does the function require a generative approach? No
Are there existing, more reliable methods to solve the problem? Yes, more traditional NLP tools or even simple search features
Is there a specific right or wrong answer to the problem? Yes, we want to know for sure that our company is mentioned.
What happens if the component fails? In the current LLM use case, the failure feeds into the following components outside the visibility of the developer, and there’s no way to troubleshoot this condition reliably.
Analysis
Does the function require a generative approach? No
Are there existing, more reliable methods to solve the problem? Yes, more traditional and mature NLP tasks for sentiment analysis
Is there a specific right or wrong answer to the problem? Yes
What happens if the component fails? In the current LLM use case, the failure feeds into the following text generation component outside the developer’s visibility, and there’s no way to troubleshoot this condition reliably.
Text Generation
Does the function require a generative approach? Yes
Are there existing, more reliable methods to solve the problem? LLMs appear to be the best solution for this functionality.
Is there a specific right or wrong answer to the problem? No, since many different texts could satisfy the problem
What happens if the component fails? We get text output that we don’t like. However, since the previous steps happen beyond the developer’s visibility, there’s no way to troubleshoot failures reliably.
Revised Example
After asking a few simple questions, we ended up with a revised use case. This one uses the LLM functionality for the problem it’s best suited for.
In this use case, only the text generation phase uses an LLM. Only confirmed mentions of the company, along with the sentiment and the content necessary to write the comment, are sent to the LLM. Much less data flows to the LLM, lowering cost and overhead. By using more robust methods, much less can go wrong as well, and less likely to have cascading failures affecting downstream functions. When something does go wrong in the parsing or analysis stages, troubleshooting is much easier since you have more visibility into those functions. So, breaking down this functionality in such a way means that failures can be more easily isolated and addressed, and you can improve more reliably as the application matures.
Now, I’m not claiming that this is a development utopia. A lot can still go wrong, but it’s a far more consistent and reliable approach than the previous example.
After talking with developers about this, some of the questions I’ve received are along the lines of, “There are better methods for my task, so if we can’t cut corners, then why use an LLM at all?” Yes, that’s a good question, a very good question, and maybe you should reevaluate your choices. This is my surprised robot face when I hear that.
LLMs Aren’t Useless
Once again, I’m not saying that LLMs are useless or that you shouldn’t use them. LLMs fit specific use cases and classes of functionality that applications can take advantage of. For many tasks, there’s the right tool for the job or at least a righter tool for the job. However, this right tool for the right job approach isn’t what’s being proposed in countless online forums and tutorials. I’m concerned with a growing movement of using LLMs as some general-purpose application functionality for tasks that we already have much more reliable ways of performing.
Conclusion
Will we inhabit a sprawling landscape of digital decay where everything rests on crumbling foundations? Probably not. But there will be a noticeable shift in the applications we use on a daily basis. But it doesn’t have to be. By being choosy and analyzing functionality where LLMs are best suited, you can make more reliable and robust applications, and the environment will also thank you.
Seems everything is clickbait these days. News sources are struggling for the scarce resource of attention. In this environment, a simple task becomes a revolution, and a mundane story gets a new life as a groundbreaking advancement. These titles and the resulting amplification by AI hustle bros provide fuel for the AI hype train, which continues in a circle like an ouroboros. In this post, we’ll look at an example of one of these and talk about the issues and risks.
Taking Spins
I saw this article on Bloomberg that mentions the US Military taking generative AI for a spin. The mental image, along with the photo they used of military cyber operation, conjures thoughts of autonomous systems duking it out or missiles launching. This is by design. It’s meant to create this image for you, but nothing so sensational happened.
What really happened is they built a chatbot over their documents. Doesn’t sound as exciting when you put it that way. For those involved, I’m sure this approach, compared to looking over 13 different manuals trying to cross-reference data and find the right content, felt fast and effective. It may also be the right approach for the problem they are trying to solve. Generative AI isn’t some all-powerful technology. It’s good for some things and not so good for others. This is also something you don’t see covered in news stories.
The military article is far from the most sensational example out there. There’s this little gem.
I probably could have found an even more sensational example to make my point, but recency bias kicked in, and the military story was top of mind since I’d discussed it on social media.
Takeaways
There are several takeaways from these types of titles and stories. Below, I’ll hit a few highlights. Let me specify that what I’m talking about here is mostly related to LLMs. Generative AI related to images, audio, and even video is a different topic and something I’ve written about previously here and here. Success is a different story in use cases with graphics and image modeling. I may write more about this in a future post, but for now, let’s stick to LLMs.
Overhyping
Overhyping in reporting is the norm and not the exception. Most cases where there’s proof of LLM success in various industries essentially boil down to people creating a chatbot over documents, some knowledge base, or even log files. This can certainly be valuable and a productivity boost, but it also sounds incredibly boring, so you end up with titles like ChatGPT is revolutionizing the financial industry, are bankers now obsolete??? Most people will never read the article, just the headline.
Most cases where there’s proof of LLM success in various industries essentially boil down to people creating a chatbot over documents.
Accuracy and Reliability
Let’s punctuate the knowledge base chatbot approach by mentioning when dealing with chatbots over sources of information, there’s no guarantee that the bot will return the correct information. It’s not like creating embeddings and doing similarity searches is foolproof. For high-impact situations with a high cost of failure, this would need to be done incredibly well to avoid a catastrophe, even with a human in the loop. Extra steps to allow a human to verify the right data and data source, ensure the data is up to date, and other additional steps are key in doing this right.
Overconfidence and Extension
Finally, the real danger is looking at the apparent success of something like a bot over a data source and making the leap that the technology has capabilities it doesn’t have or the ability to do even more impactful things with an even higher cost of failure. More impactful things, such as suggesting whether to launch missiles or to drive a tank. These are extreme cases, but it proves a point.
Edge cases and complexity are AI’s worst enemies. You don’t see the edge cases in small experiments or super simple tasks, there may not be any, but as with many use cases with high impacts for failure, edge cases may be everywhere, lurking in the shadows waiting to strike when you least expect them.
You don’t see the edge cases in small experiments or super simple tasks.
This overconfidence and extension of generative AI into other areas where it’s not well-suited will cause damage. As this tech is put in more and more critical paths, it’s only a matter of time until there’s a catastrophic failure.
Conclusion
There are a lot of people experimenting and a lot of money flowing in the generative AI space, and as with any technological advancement, we should be prepared to be surprised. However, take the reporting on generative AI and any stories hyped up by the AI hustle crowd with a grain of salt. Perverse incentives are everywhere. Generative AI may be a good fit for your use case, but beware, this isn’t without pitfalls. Generative AI is far from some utopian technology, and given critical use cases with a high cost of failure, the only winning move is not to play.
In case you may have missed it on my social media, I made a few podcast appearances in the last month. I’ve been chatting with people about large language models, security, and the associated hype.
The Perfect Storm
I was on episode #39 the Perfect Storm Podcast talking about large language models and security issues, as well as other topics around AI. Of course, a healthy dose of ChatGPT as well. You can find that episode here: https://www.harbortg.com/the-perfect-storm
Down the Security Rabbithole
I was also on the Down the Security Rabbithole podcast talking about the current state of cyber hype and the overblown reporting of ChatGPT. You can listen to that here: https://www.buzzsprout.com/2153215/12675548
Up Next
I’ve been busy with work and putting conference content together. I have a backlog of blog posts to get out. Those are coming soon. Thank you.
By now, you’ve probably heard of the letter [https://futureoflife.org/open-letter/pause-giant-ai-experiments/] from the Future of Life Institute, signed by experts calling for a pause on AI experiments. Odds are, you’ve seen reporting on it but didn’t read it yourself. You should go read it before forming too much of an opinion. Some, like Eliezer Yudkowsky even argue that the letter didn’t go far enough and we should shut it all down.
The criticism surrounding the letter hasn’t been intellectually honest either. It seems everyone wants to get their hot takes in and is looking more for one-liners than solutions. Critics attack the institute or the individual authors without acknowledging the concerns outlined, committing a logical fallacy known as the genetic fallacy
The criticism seems to fall into one of three camps:
I don’t like “X” therefore, nothing is valid.
My concern wasn’t addressed. Therefore, nothing is valid
There are no dangers
If you don’t believe me, feel free to read the rebuke letter.
Welcome to the hot mess that is AI alignment.
We can be harmed by both good and bad AI
I’ve previously weighed in on the alignment topic and expressed my concerns about the current development trend. So if you are looking for definitions of alignment and the paperclip maximizer, you can see that here.
As a security researcher, I’m much more concerned with near-term AI risks. These are the risks typically caused by bad AI, for example, the velocity at which we see companies shoehorning chatbots into their application is concerning. But, there are risks with good AI as well. Good AI has the potential to displace many workers, negatively affect how humans communicate, and create other societal damage. Of course, both good AI and bad AI can have privacy issues as well.
Transformer-based Large Language Models (LLMs) won’t lead to AGI. They aren’t actually reasoning, and they don’t have specific goals to maximize, but what comes next just might.
My Problems with the Letter
My issues with the letter have to do with the optics of it, and I think it’s a bad look overall. The letter makes it far too easy to attack by making the signers look irrational or out of touch, something which, if you look at the names on the list, they are not.
The first issue I have is the mention of GPT-4, which I think is a mistake. This mention makes it seem like either GPT-4 itself is the problem or was some sort of catalyst for the letter. If you read the letter, you can tell that’s not the case, but if you look at the criticism lobbed at the letter, this is used because some of the signers have criticized the capabilities of GPT-4.
The second issue I have has to do with the six-month pause on training more capable AI systems. The duration of six months looks arbitrarily chosen. There is a list of what the hopes would be during these six months, but it seems far too heavy a lift. Companies could use the pause to gain a competitive advantage, or bad-faith actors like nation-states could continue their development.
Finally, the letter calls for an unrealistic recommendation that has no way of coming to fruition. These companies aren’t going to stop their work. There’s no incentive for them to do so. In fact, there is precisely the opposite. These companies are locked in an arms race.
This seems like an issue where we are going to have the build the airplane while we are flying it. However, it’s hard to deny that the letter is having a positive effect. The Biden administration discussed AI dangers with the President’s Council of Advisors on Science and Technology (PCAST) yesterday. It’s hard to believe that conversation happening when it did without the attention around the letter.
AI Alignment and Ethics
AI alignment and ethics are two different areas, and researchers focused on one aren’t researching the same things as the other. At a high level, alignment folks are focused on the intended goals of a system, and ethics folks on the harms from systems, intended or unintended.
However, when the topic of alignment comes up, ethics is always lumped in there. AI ethics organizations don’t have the best track record of proposing realistic approaches either. Recommendations are often overly academic, disconnected from the real world, and sometimes even illegal. That is unless you think the way to stop a ruthless dictator from using deepfakes is to publish a usage standard, or you think the way to make systems fairer is to collect even more sensitive personal data. These out-of-touch recommendations make it easier to discount the larger AI ethics community as well as the alignment community. This is a huge mistake because we need both communities for a successful future.
Additional Reading
Like everything else in our culture, AI Alignment has now become a spicy, politicized topic. In reality, we need to separate ourselves from the hype and arguments and focus on the very real problems.
For some additional reading on this topic, Dan Hendrycks published a paper called Natural Selection Favors AIs Over Humans. It’s a great read with some solid food for thought, especially the section on Value Erosion, which is something I’m incredibly concerned about and have covered in various aspects on my blog.
Chatbots have become the Bitcoin of 2023, but unlike the previous cryptocurrency craze, this feels like everyone is on board. Regardless of your professional background and expertise, it seems everyone is all in on chatbots. While we can debate the technical merits, capabilities, and future of these tools ad infinitum, one thing is for sure, everything is getting an assistant. In the mad rush to chatify products, differentiate, and get a return on investment, there are quite a few non-technical issues people just aren’t considering. One of these is the impact of the inevitable outsourcing of simulated emotional connections.
Chatification of Communication
Should we create a dystopian future where humans only communicate with each other through a layer of bots where all emotional connection is simulated?
Here’s a quick question. Should we create a dystopian future where humans only communicate with each other through a layer of bots where all emotional connection is simulated? Because this scenario is exactly what’s in our near-term future.
Regardless of the arguable capabilities of these tools, one thing is undeniable, they are good writing assistants. Since this seems to be what they excel at, they will be plastered in all kinds of products we use to communicate, such as word processors, email clients, messaging systems, social media, etc. We are headed for the near-term chatification of many products.
On the surface, this may not seem like a bad thing. Why wouldn’t you want help writing a cover letter for your resume or drafting repetitive emails for work? This seems like a win-win, but I’d argue it’s a trap. Lurking beneath the surface of this costless efficiency gain is a field filled with landmines that we casually stroll across.
In his song The Boss, James Brown repeats the line, “Paid the cost to be the boss.” At least James Brown knew there were costs to be paid. I think with frictionless access to these tools integrated into our communication apps, we may not realize we pay a cost.
Emotional Outsourcing
I did an experiment. I thought of a terrible situation in which a family member lost a child. It has to be one of the most terrible situations to be in. So, of course, I asked ChatGPT to draft a sympathy letter. The result is below.
Prompt: Write a heartfelt email that expresses condolences to a family member on the loss of their child.
Response:
Subject: Our Deepest Condolences for the Loss of Your Precious Child
Dear [Family Member's Name],
I hope this message finds you in the gentle embrace of love and support during this incredibly difficult time. It is with a heavy heart that I write to offer my sincerest condolences for the unimaginable loss of your beloved child. There are no words that can truly express the depth of sorrow I feel for you and your family.
Your child was a beautiful soul who brought light and joy to everyone they encountered. The time we spent together, though brief, left an indelible mark on my heart. I will always cherish the memories of their laughter, their kindness, and their unwavering spirit. Your child was truly a gift to us all, and their absence leaves a void that can never be filled.
As you navigate the turbulent sea of grief, please know that you are not alone. Our entire family stands beside you, offering our love, support, and strength. We may be separated by miles, but our hearts are united in the shared pain of losing someone so special. If you need a shoulder to cry on, someone to talk to, or help with any practical matters, please don't hesitate to reach out. We are here for you, now and always.
In the days, weeks, and months to come, I encourage you to lean on the love and compassion of those around you. Share stories of your child, reminisce about their life, and celebrate their memory. Grieving is a deeply personal journey, and while I cannot presume to understand the depths of your pain, I hope that you find solace in knowing that your child's life touched so many others in profound and meaningful ways.
Please accept my deepest condolences once again, and know that my thoughts and prayers are with you and your family during this heart-wrenching time. May your cherished memories provide you with comfort, and may your child's spirit guide you through the darkness and into the light of healing.
With all my love and sympathy,
[Your Name]
If you skimmed this letter and said, “Wow, ChatGPT did a really good job,” then you totally missed the point, but you landed where most people would, even people using the tool for this task. Read it again, then think about the situation. Think about being the recipient of this message.
There are many obvious problems with this letter, even in the wording itself. It seems to use the right words, but it’s completely over-baked and arguably too long. This message doesn’t have any emotional connection despite using emotional words. If feels informal, like a templated decline letter when applying for a job. A family member, hell, a human, would share some level of grief as well. As they wrote the letter, they’d be putting themselves in the recipient’s shoes, and that feeling would come out in the letter.
But whether tools like ChatGPT do a good job or not isn’t the issue. The issue is that we treated the family member on the other side as just another task to shuffle off of a pile and not the mourning family member they are.
The Larger Issue
There’s an old saying, “It’s the thought that counts.” There’s a lot of truth to this, but the “thought” is exactly what we are outsourcing here. A sympathy letter isn’t about the words you use, it’s letting someone know you are thinking about them.
The “thought” is exactly what we are outsourcing here
Why would anyone read your sympathy letter if they knew an AI wrote it all or in part? I know I wouldn’t. My response would be, “You narcissistic asshole, you couldn’t even be bothered for a couple of minutes out of your day to think about me and the tragedy that befell my family.” LLMs aren’t sorry. They don’t feel bad, they don’t feel anything.
You may think the sympathy letter was an extreme example, but I don’t think it is. If you remember, last month, Vanderbilt University had to issue an apology after using ChatGPT to draft an email about a shooting at another school. At the time, I wasn’t sure if I had a problem with it on the surface and that I’d have to give it some thought. I’ve thought about it, and I have a problem with it. Even though the shooting didn’t happen at Vanderbilt and it was a one-to-many communication, the email simulated human emotions and, in effect, was trying to manipulate humans. The bad thing about this is that if Vanderbilt hadn’t pointed out the fact that they used ChatGPT to write the message, it probably wouldn’t have been noticed. This teaches the wrong lesson because people learn not to reveal they used an assistant.
Even in more mundane and less emotional communication tasks with humans, there are still issues. We are headed for a near-term future where we treat humans as apps or API calls, with communication as just another task that needs to be checked off of a list. What does this say about us and where we are headed that we are so wrapped up in ourselves that we can’t spend any time out of our day to think about others? It’s not a good place.
Some Tasks SHOULD Have Friction
Not every part of human life should be a target for efficiency gain or friction reduction. I’m not sure when the appification of humans started, but I first recognized it with Uber. For example, request turning the driver “off,” aka telling them not to talk, and treating the human as a self-driving car.
We started treating people differently when we communicated with them online, via social networks, vs. in-person communication. This abstracted communication gave us a license to dehumanize them, justifying our actions in our heads.
Will this lead to writing assistant wars where people’s bots battle it out on social media? It’s always hard to tell with these things. The fact is, we really don’t know what the impact of this will be.
Some things aren’t meant to be frictionless or need an efficiency boost. Some tasks aren’t meant to be painless, but remember, it’s not about your pain. It’s about other people. Friction in human communication tasks forces you to think, consider, compromise and adjust.
The Impact
It’s early, and it’s always hard to predict how these things will play out because real life is far more complex than we give it credit for. From a psychological perspective, these tools will further accelerate our dehumanization of others, but there are more logistical issues as well.
Writing isn’t just an act of communication, it’s an act of discovery.
We rarely type without thinking. As I write, even in mundane replies to co-workers, I’m still thinking and sometimes, on the spot, come up with new ideas and new solutions as I write. One of the issues that seem to get lost in the LLM debates is about writing itself. Writing isn’t just an act of communication, it’s an act of discovery. It’s one of the main reasons I wouldn’t use ChatGPT to write blog posts, books, works of fiction, and a whole host of other writing tasks. Even if ChatGPT made me a better writer (something I highly doubt), it would make me a worse thinker, and that is not a good tradeoff.
Even if ChatGPT made me a better writer, it would make me a worse thinker.
How do we learn to cooperate with others, consider their positions, compromise, create consensus, and all of the other things we do as humans, if we are letting our writing assistants battle it out? Who is the one being convinced?
As we outsource more communication and emotional connections to intermediary assistants, we are in for more miscommunication and less understanding, consideration, compromise, etc. The list goes on. This begs the question, is it really reducing friction after all?
Conclusion
As humans, we need to decide if we want technology to manipulate us. I’m firmly in the camp that I don’t want this. Believe it or not, this is an unsettled issue that isn’t getting enough attention. But like it or not, this is happening, and there’s not really much we can do about it. It’s one thing to say don’t use these tools, but they may become so tightly integrated that it’s almost impossible not to use them. We need to go out of our way to think about the people on the other side of our communication, even if we don’t like them.