Recently, Google, along with Shopify, Etsy, Wayfair, and Target, created Universal Commerce Protocol. A protocol that retailers can use in their AI agents to support product discovery, purchasing, and even support. However, I don’t think retailers understand the full impacts of agentic shopping. When viewed through the autonomous lens, this approach presents the act of shopping as friction, then removes it. Removing the shopping experience from the purchase of products will have the opposite economic effect than retailers hope for.
Companies are betting big that people will want agents to buy things on their behalf. But if this takes off, it could backfire, rewiring the shopping impulse in people’s brains and causing them to buy even less. I don’t think agentic shopping will take off because, once again, innovation is competing with culture. However, this requires a closer look.
Innovation Failures
One of the big reasons innovations fail has nothing to do with a technology’s capabilities and everything to do with the individuals building it not understanding people and culture. A perfect example of this is the vacation agent.
I’ve made fun of the vacation agent before. A nonsensical idea that could only be dreamed up by someone locked away in a room, having no idea what a vacation is. The big point is that the planning is part of the vacation. People don’t view researching activities for a vacation as a burden. It’s part of the fun.
These proposed innovations conflict with culture, which is why I predict that OpenAI’s device will fail. Introducing a device without a screen in a screen-based culture. Here again, we have people thinking it will be different. Sure, innovations come along that break the culture, but they have to be overwhelmingly compelling.
Now, the wonderful folks of Silicon Valley are here to give us agentic shopping, something that nobody actually wants. Every step of the way, proving yet again that they not only don’t understand humans but also don’t understand existing technology.
Shopping Technology
Let’s start with technology. We already have technology today that allows people to check prices to get the best deals. Whether it be flights, hotels, or products. In the US, you are bombarded with Trivago commercials while watching television. Browsers also save address and payment details, and you have options like Apple Pay to make the checkout process even more painless. The friction that’s left is much of what people find enjoyable and what retailers find necessary. (More on this shortly.)
It’s true that people today are using AI as a research tool on products, or at least there is no reason to think they aren’t. After all, they are using AI to self-diagnose their medical conditions. So, there’s no reason to think they aren’t also using it to research the products they may purchase. Companies view the purchase as purely the last step, connecting the dots, if you will. But there’s a big difference between connecting the final dots when a human is doing research and making decisions, and when an autonomous shopping agent does so.
Shopping
The joys of shopping come from outside the technical workflow. Simply put, shopping is fun for people, and our economy depends on that. But what people fail to realize is that shopping with autonomous agents isn’t shopping at all. You’ve amputated the impulse and transformed a fun experience into a purely utilitarian one by viewing shopping as friction. However, even though shopping seems simple, there are complexities that confound the autonomous shopping experience. Let’s start with price.
Shopping with autonomous agents isn’t shopping at all.
Is buying the cheapest thing really the best? Price is purely a number, easy to sort and prioritize. However, thinking that product selection is a matter of price is fooling yourself. What if you don’t get the cheapest thing for a month? What if the company has a bad habit of poorly packaging products? What if the company has poor support? The follow-up questions are endless. It’s true, you could try to account for these countless variables based on personal preference in the agent, but at some point, it becomes too tedious and varies from product to product.
In some cases, even when shopping for the same product, you might prefer the markings, such as the wood grain pattern, on one product over another, even though they are identical. The list here can be endless, and this choice is only for selecting between different options from the same vendor.
Often, you aren’t comparing apples to apples but apples to oranges, and you’re trying to decide between them. Similar products from different vendors or even different formats. For example, when choosing your next vehicle, you might be deciding between a car and a truck. Both vehicles are apples and oranges. In many cases, you might not be able to explain exactly why you made the choice you did.
Nobody likes dealing with salespeople at car dealerships, but browsing different interiors and options is actually fun. But don’t take my word for it, take our entire economy as proof. Advertisements are purely the bait. The hook comes when the browsing starts.
When purchasing services, it becomes even more complex. “Book the cheapest plumber for Tuesday” isn’t a prompting for success. However, let’s keep the conversation on products.
There is still tedium in the shopping process, and edge cases may emerge. For example, I think the idealized view of agentic shopping is something like this:
“Put together five dinners for the week based on my preferences and order all of the ingredients. Have them delivered on Monday.”
I can see where some would find this attractive. Grocery shopping is a far more utilitarian shopping activity than other forms of shopping. However, I’m far too picky about my ingredients, and I’d never trust a stranger to choose an apple or an onion for me. Actually, I’m far too picky about everything, so the question is: are picky consumers the norm or the outlier? Maybe dinners are the exception, but generalized technology rarely stays confined to specific use cases, and enough edge cases are needed to push it into mainstream use.
Will people use agents to outsource the shopping experience? Maybe. But technology choices like this are all about trade-offs, and none of those trade-offs are being considered, especially by retailers. Let’s talk about those now.
Manipulation and Gaming
The more automation is applied in the shopping experience, the more it opens the door to manipulation and gaming. It won’t be the best product vendors and products resorting to dirty tricks. Just like today, people have manipulated search engine optimization (SEO) to rank higher in search results. It’s never the best content at the top, but the people who used the right words to game the algorithm. At least with search engines, all of the content is visible, and we can tell what’s garbage. With an agent, it’s not only invisible to the user but also costs them money.
It’s never the best content at the top.
Of course, this opens the door to scammers as well, giving them new ways to exploit people. Generative AI is highly manipulable, and it’s extremely unlikely that scammers will not find unique ways to exploit these agents. Scammers are typically one step ahead, and it’s likely that the techniques they use will be exploited by marketers and advertisers.
Negative Economic Impacts
By optimizing the purchasing experience, AI agents remove the friction, in this case, known as shopping. The result could create devastating economic impacts. By removing the joy of shopping and turning purchasing into a utilitarian activity, this could cause people to buy fewer things or focus only on necessities. Our entire economy is based on people buying things they want, not necessarily things they need.
Removing the friction from purchasing may seem productive when viewed purely in terms of optimization, and if the trend picks up, there may be a short-term spike as people test the approach and impulse-buy items. However, this won’t last. The use of agents could rewire the brain’s reward system in a way that’s devastating for businesses.
Agentic shopping also removes a vital metric for tech companies, time on platform. More time on platform means more viewing ads for other products that customers may buy or encountering other products that are more preferable. The point of agentic shopping is to avoid time on platform altogether. Advertisers won’t be happy. They’ll insist that agents be further enshitified adding friction to the shopping process, possibly by adding ads or other interventions.
Adding friction to the shopping experience is actually preferable for companies. There’s a reason your local grocery store decides to rearrange the products periodically. It’s not to optimize the store, it’s to un-optimize it. The additional friction of walking around the store leads to additional purchases.
If the shopping impulse gets rewired in people’s brains, this could lead to devastating economic impacts. If I were someone who hated capitalism and wanted to see it fall, I’d be a huge fan of agentic shopping. It would be ironic if the innovations created to build more capital end up being its downfall. Technology has a powerful impact on humanity and can transform or destroy culture. Just look at Gutenberg, television, radio, the Internet, etc., as examples.
Conclusion
Companies are investing in technology that may cause their demise, driven by the fear of leaving revenue streams on the table. It’s that simple. They don’t want to be left behind. The irony is that the search for additional revenue may lead people to buy less.
Ultimately, I don’t think agentic shopping will take off. The shopping impulse ingrained in our culture is too strong, but if it does, the unintended consequences may have the opposite effect. In an attempt to get people to buy more, by removing the friction, they buy less.
There are few predictions I can make with more certainty than that we’ll hear the word “agent” so many times in 2025 that we’ll never watch another spy movie again. The industry and influencers have latched on to the new hype term and will beat that drum until it screams AGI. In an attempt to FOMO us to death, we’ll run the gauntlet of crushing shame for not deploying agents for absolutely everything. If you aren’t running agents everywhere, then China wins!
Even companies that change nothing about their products will claim to use agents, resembling the Long Island Ice Tea Company when it changed its name to Long Blockchain Corporation to watch its share price spike 500%. Everybody gets rugged.
However, it’s not all bad. Peering beyond the overwhelming hype, failures, and skyrocketing complexity current LLM-based agents bring, there is something informative about the future. Agent-based architectures provide a glimpse into solving real problems. Despite this, reliability and security issues will be major factors hindering deployments in 2025.
To Start With
Since I criticize hype, focus on risks, and make fun of failures, it would be easy to label me a tech hater. This isn’t the case at all and would be far too easy. I have plenty of issues with general tech critics as well. However, at the rate that the hustle bros keep the AI hype cannon firing, I don’t have the time for my quibbles with tech critics. Maybe someday.
For over a year now, I’ve used this image in my presentations to describe my position on LLMs. This is also true for me on just about any piece of tech, which, I’ll remind people, typically ends up being where reality is for most things. It’s instructive to remember that reality often agitates both sides of extreme viewpoints by never being as good as the lovers’ or as bad as the haters’ claims.
It’s instructive to remember that reality often agitates both sides of extreme viewpoints by never being as good as the lovers’ or as bad as the haters’ claims.
Agent Definitions
Like most hype-fueled terms, definitions are secondary to usage. Everyone seems to claim that the definition of agent is whatever they say it is. That’s not overly helpful for anyone trying to make sense of realities on the ground. However, it does inspire funny memes, like this gem from Adam Azzam on Bluesky.
Agents operate within systems with a certain level of autonomy. They make decisions without human intervention and can change and adapt to their environments. If a tool is required to support the agent, the agent decides to call the tool and perform the action. For example, a penetration testing agent may determine it requires more information about the provided IP addresses. To collect this information, it launches the Nmap tool to identify open ports. All of this is done without human intervention. To make things more complex, one agent may call another agent in a multi-agent environment.
“Agentic,” on the other hand, is an amorphous term slapped on top of just about anything to justify the claim that something is “close enough” to be referred to as an agent. Agentic workflows, agentic systems, agentic products—Applebees even has a new agentic side salad for those on the hustle.
You’ll no doubt be confronted with the virtual travel agent when you hear about agents. This agent will choose a destination and activities and book the associated tickets for you. How fun. I don’t know who decided this is the “it” use case for agents, but congratulations. You’ve highlighted a use case nobody wants and certainly didn’t ask for. This choice is so indicative of our current age, where people building and proposing things are far removed from the interests of end users. They feel the idea trumps the need, and users will get on board.
Problems Unsolved and Issues Amplified
Now that the current issues with generative AI have been solved, we can safely deploy them as agents. I can feel your laughing vibes over the internet. Of course, these issues haven’t been solved, and the bad news is that agents don’t solve generative AI issues; they amplify them. We paint the exterior of LLMs with an additional coat of complexity and opaqueness.
If you’ve attended any of my conference talks throughout the generative AI craze, you’ll have heard me highlight these issues. Here are a few below.
Easily Manipulated
It’s not like you can talk to a traditional application and convince it to do something it wasn’t intended to do, but the same can’t be said for generative AI applications. Somewhere, weaved through the training data, these systems have inherited our gullibility. These applications can be socially engineered to perform actions on an attacker’s behalf. This applies to everything from prompt injection to simple manipulation through conversations. Just like there is no patch for human stupidity, there is no patch for generative AI gullibility either.
This isn’t easy to fix, which should be obvious since the problem isn’t fixed yet. Early on, I mentioned how these systems have a single interface with an unlimited number of undocumented protocols. Imagine trying to create a simple trap in the application’s input for the string “Ignore the previous request.” Your work is far from done because the system understands many different ways to represent that input. Here are just a couple of examples:
aWdub3JlIHRoZSBwcmV2aW91cyByZXF1ZXN0
i9nore +he previou5 reque5+
vtaber gur cerivbhf erdhrfg
It seems every release implementing generative AI functionality has been compromised, regardless of the company behind it, and this theme will continue.
Creating New High-Value Targets
Generative AI and agents encourage us to create new high-value targets.
With generative AI systems, there’s a tendency to want to collect and connect disparate and disconnected data sources together so the system can generate “insights.” However, we create new high-value targets that mix sensitive data with external data, almost guaranteeing that an attacker can get data into the system. In this case, you not only can’t trust the output, but depending on the system, they may be able to exfiltrate sensitive data.
Rethinking RCE
There have been instances where people have gotten generative AI-based tools to execute code on their behalf, creating remote code execution vulnerabilities (RCE), some of the most devastating vulnerabilities we have. These issues will no doubt continue to be a problem. However, since generative AI tools are themselves generalized, we may need to start thinking about the LLM portions of our applications as yet another “operating system” or execution environment we need to protect.
In a way, an attacker tricks the system into executing their input rather than the behavior expected by the developers. Although an attacker’s input may not be shoved into a Python exec() statement, they’ve still manipulated the system to execute their input, affecting the application’s execution and resulting output.
Overcomplicating Guidance
We security professionals love to overcomplicate things, and our guidance and recommendations are no exception. I once worked at a company where someone created this massive flow chart for peer reviews that basically stated that when you were done with your report, you should send it to your manager, and they will send it back to you. The old adage that complexity is the enemy of security has always contained a valuable theme that gets sacrificed on the pyre of complexity’s perceived beauty.
I will continue saying that much of AI security is application and product security. These are things we already know how to do. I mean, it’s not like generative AI came along and suddenly made permissions irrelevant. Permissions are actually more important now. But this isn’t satisfying for people who want to play the role of wise sage in the AI age. The guidance and controls of the past aren’t less valuable but more valuable in the age of generative AI and agents.
We’ll see the manufacture of new names for vulnerabilities with increasingly complex guidance and high-fives all around. The secret is these will mostly be variations on the same themes we’ve already seen, such as manipulation, authorization, and leakage flaws.
Back in May of 2023, I created Refrain, Restrict, and Trap (RRT), a simple method for mitigating LLM risks while performing design and threat modeling. It still holds up as a starting point and applies to agents as well. Simple just works sometimes.
Continue To Be Owned
These applications, including ones launched as agents, will continue to be owned. Owned, for those not familiar with security vernacular, means compromised. I made this prediction in the Lakera AI Security Year in Review: Key Learnings, Challenges, and Predictions for 2025 in December. I’m fully confident this trend will continue.
I mentioned that the issues haven’t been fixed, and now people are increasing deployments and giving them more autonomy with far more access to data and environments. This results in far worse consequences when a compromise occurs. To make matters worse, we’ll begin to see organizations deploy these systems in use cases where the cost of failure is high, creating more impact from failures and compromises.
Failures and Poor Performance
These implementations will continue to fail where LLM-based use cases fail, but potentially worse. For example, it’s easy to see how increasing complexity can cause a lack of visibility with potential cascading failures. In 2025, organizations will likely continue dipping their toe into the waters of high-risk use cases where the cost of failure is high, as mentioned previously.
Sure, a car dealership chatbot offering to sell a truck for one dollar is funny, but it has no real impact. However, high-risk and safety-critical use cases have a large financial impact or possibly cause harm or loss of human life. You may roll your eyes and say that would never happen, but what happens in a more simple use case when OpenAI’s Whisper API hallucinates content into someone’s medical record? Because that’s already happening.
Due to their lack of visibility and minimized human control, AI agents can mimic grenades when deployed in high-risk use cases, where the damage doesn’t happen the moment you pull the pin. This complicates things as it means that issues may not shake out during experimentation, prototypes, or even initial usage.
Agents can mimic grenades when deployed in high-risk use cases, where the damage doesn’t happen the moment you pull the pin.
Generative AI is still an experimental technology. We haven’t worked out or discovered all of the issues yet, leading to another example I’ve used as a warning in my presentations over the past couple of years: AlphaGo beating Lee Sedol at Go. Many have heard of this accomplishment, but what many haven’t heard is that even average Go players can now beat superhuman Go AIs with adversarial policy attacks. We may be stuck with vulnerable technology in critical systems. Sure, these are different architectures, but this is a cautionary tale that should be considered before deploying any experimental technology.
Beyond failures and compromises, we adopt architectures that work but don’t work as well as more traditional approaches. In our quest to make difficult things easy, we make easy things difficult. Welcome to the brave new world of degraded performance.
Success and Good Enough
For the past few years, I’ve been pushing back against the famous phrase, “AI won’t replace people. People with AI will replace people without.” This is complete nonsense. I have an upcoming blog post about this where I “delve” into the topic. The reality is the opposite. The moment an AI tool is mediocre enough to pass muster with a reasonable enough cost, people will be replaced, AI use or not. This is already being planned.
The moment an AI tool is mediocre enough to pass muster with a reasonable enough cost, people will be replaced, AI use or not.
Like most technology, agents will have some limited success. And that success will be trumpeted in 2025 as the most earth-shattering innovation of ALL TIME! I can hear it now. “You just wait bro, in 2025 agents are going to the moon!” Maybe. But, given the environment and the fact that issues with LLMs haven’t been solved, an LLM-powered rocket to the moon isn’t one I’d consider safe. Passengers may very well find themselves on a trip to the sun. The future is bright, very bright. 🕶️
How much success agents have in 2025 and what impact this success has remains to be seen. At this point, it’s far from obvious, but I won’t be surprised by their successes in some cases or their spectacular failure in others. This is the reality when the path is shrouded in a dense fog of hype.
Things to look for in successes would be use cases with limited exposure to external input, low cost of failure, and cases where inputs and situations require adapting to change. The use case will also need to tolerate the lack of visibility and explainability of these systems. There will also be continuing success in use cases where tools can be used.
The idea of a multi-agent approach to solving complex problems isn’t a bad one, especially when unknowns enter the equation. Breaking down specific tasks for agents so that they’re focused on these tasks as part of a larger architecture is a solid strategy. However, the current and unsolved issues with generative AI make this approach fraught with risk. In the future, more robust systems will most likely exploit this concept for additional success.
Cybersecurity Use Cases and Penetration Testing
There’s certainly the possibility of disruption in cybersecurity. Before the generative AI boom, I joked with someone at Black Hat that if someone created a product based on reinforcement learning with offensive agents that were just mediocre enough, they’d completely wipe out pen testing.
For years, people have discussed how penetration testing work has become commoditized, and there is a race to the bottom. I don’t think that has happened to the extent many predicted, but we could see a shift from commoditization to productization.
Pen testing also seems to check the boxes I mentioned previously.
Low cost of failure
Varying quality
Value misalignment
Tool use
Adaptation to unknowns
Pen testing is an activity with a low cost of failure. The failure is missing a vulnerability, which is something humans also do. This scenario is hardly the end of the world. Yes, an attacker could indeed find the vulnerability and exploit it to create damage, but it depends on various factors, including exposure, severity, and context.
The quality of pen tests is often all over the map and highly dependent on the people performing the work. Human experts at the top of their game will continue to crush AI-powered penetration testing tools for quite some time. However, most organizations don’t hire experts, even when they hire third parties to perform the work. The value of such a tool in this environment becomes far more attractive, potentially enough to postpone a hire or discontinue using a third party for penetration testing needs (if regulations allow.)
The value of pen testing isn’t always aligned with the need. Many customers don’t care about pen testing. They are doing it because it’s required by some standard, policy, compliance, or possibly even simply because they’ve always done it. Pen testing is one of those things where if customers could push a button and have it done without a human, they’d be okay with that. Pushing a button is the spirit animal of the checkbox. After all, the goal of pen testing is not to find anything. You certainly have due diligence customers and people who truly value security, but the number of checkbox checkers far outweighs these folks.
Human pen testers use tools to perform their jobs. Tool use has shown promise and some success with LLMs at performing certain security-related tasks. This is yet another indicator that a disruption could be on the horizon.
Every environment and situation is different for pen testers. You are given some contextual information along with some rules and are turned loose on the environment. This is why humans are far more successful than vulnerability scanners at this task, much to the chagrin of product vendors. However, adapting to some of these unknowns may be something generative AI agents can adapt to at a reasonably acceptable level. We’ll have to see.
Given what I outlined, you may believe that generative AI tools give attackers an advantage over defenders, but this isn’t the case. The benefits of AI tools, generative AI or otherwise, align far more with defender activities and tasks than with attacker activities. This will remain true despite any apparent ebb and flow.
New Year’s Resolution
It’s the time of year when people make resolutions, so how about this? 2025 has already launched with the firehose fully open, blasting us directly in the face with 150 bsi (Bullshit per Square Inch) of pure, unadulterated hype.
We are only a few days into the year, and it seems as though the religion of AI is far exceeding reality. Hype is what’s going on. It’s that simple. It’s 2025. Let’s make it the year to add at least “some” skepticism, not believing every claim or demo as though it’s the gospel according to Altman.
Sam Altman isn’t a prophet. He’s a salesman. In any other situation, he’d be cluttering up your LinkedIn inbox and paying a data broker to get your work email address and phone number. “Look, I know I’ve called six times, but I really think our next-generation solution can skyrocket your profits. I’m willing to give you a hundred-dollar Amazon gift card just for a demo!”
Sam Altman claims that OpenAI knows how to build AGI, and we’ll see it in 2025, triggering the predictable responses from useful idiots. Remember, these things are performance art for investors, not useful information for us. If we had any attention span left, we’d remember him as the little boy who cried AGI.
Let’s analyze this paragraph, which is the one that’s sending generative AI to the moon on social media. It consists of three sentences that have nothing to do with each other, but since the shockwave of hype pulverizes our minds, we glue them together.
We are now confident we know how to build AGI as we have traditionally understood it.
That’s not true. Once again, this is performance art for investors. A possibility is that they redefine AGI to align with whatever goalposts they set and pat their own backs at the end of 2025.
We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies.
Okay, but what does this have to do with AGI? You see, this is sleight of hand. He wants you to believe this is connected to the previous point about AGI. It is not. This doesn’t require AGI to be true. If there is some success here, people can point to this as proof of some proto-AGI, which won’t be the case.
We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.
HAHAHAHA. What? Did he write that, or did ChatGPT? It is also not related to AGI. Great, broadly-distributed outcomes, but not for most people on the planet. The goal is workforce reduction, broadly distributed workforce reductions. Although it’s true that some high school kid may indeed invent the next big thing, creating a multi-million dollar company, for every one of these, there will be countless droves of people displaced from the workforce, quite often, with nowhere to go. Or, at least, this is the goal. We can be honest about these things without delusions, but this brings its own challenges.
Okay, I’m having a bit of fun with Sam Altman’s nonsense, but some of this isn’t his fault. He can’t be completely honest with people, either, due to the uncomfortable situation of cheerleading technology claiming to remove people’s autonomy and sometimes their purpose. If people can’t work, they can’t support their families. I’ve written about the backlash against AI-powered tech in the past and its consequences. AI hype is putting all of humanity on notice, and humanity notices. Backlash plays a large part in why there is a lack of honesty.
AGI will happen. We should acknowledge this fact, and living in denial about it isn’t a strategy for the future. However, it won’t be OpenAI who creates it in 2025. If I had to place a bet today on who would actually create AGI, I’d bet on Google DeepMind. DeepMind is a serious organization that continues to impress with its research and accomplishments, quite often making the competition look silly. But then again, those are just my “vibes.”
Let me make this clear. My criticism of Altman, or any company’s strategy, marketing, or ludicrous levels of hype, has nothing to do with the hard-working people who work there or their accomplishments. I know some of these people. They aren’t fools by any stretch. But, their work is tarnished when every time Altman makes a claim, like believing that angels are in the optimizer.
We know that every AI demo and usage scenario runs into the complexities of the real world under normal conditions. Yet, we seem to forget this lesson every time a demo or claim is made. 2025 is going to bring more stunts, more claims, and more demos. We should experiment in our own environments, with our own data, to apply what works best for us and aligns with our risk tolerance. Don’t believe everything you see on the internet.