sad man with party horn
Feature

2025 Was Supposed to Be the Year of the Agent. It Never Arrived

4 minute read
Lance Haun avatar
By
SAVED
Sam Altman proclaimed 2025 the year of the agent. In 2026, most still fail real work — but signs of the future are emerging.

In January 2025, Sam Altman wrote that the year ahead might see the first AI agents "join the workforce" and materially change company output.

In 2026, we’re still waiting for his prediction to come true. 

According to Deloitte's Tech Trends 2026 report, only 11% of organizations have agents in production. Another 38% are running pilots, and 35% have no agentic strategy at all. 

When researchers tested the best AI models on real-world tasks using the APEX-Agents benchmark, even top performers like Gemini 3 Flash and GPT-5.2 completed fewer than 25% of tasks on the first attempt. After eight attempts, success rates only climbed to about 40%.

To be fair, you probably won’t get a human to give something eight attempts and get it right only 40% of the time.

What went wrong, and should we be making 2026 (or 2027, or 2028) the year of the AI agent?

Narrow Agents Work. Others Stumble

Technical limits are clearer now than they were a year ago. Professional work requires stitching together information from multiple tools. Even basic knowledge work requires sorting through context and pulling data from spreadsheets, checking Slack for updates and referencing documents in Google Drive.

APEX-Agents showed that models fail when asked to track information across domains like this, though. The best models stumbled on multi-domain tasks because they couldn't maintain context or reason across tool boundaries.

Agents also hallucinate under pressure, and we’re not talking about telling us how many Rs there are in the word strawberry. They invent answers, execute unauthorized actions and lack robust error handling or permission controls.

That’s a dangerous combination.

Coding and other narrowly-focused agents are the exception. Tools such as Cursor and Claude Code have made developers more productive because coding is a well-structured domain with rapid feedback loops and contained context. You write code, run it, see errors immediately and fix them. 

Knowledge work across Slack, email and spreadsheets doesn't offer that structure or error tolerance. The gap is both in technical capability and domain structure.

Liability Killed Adoption Faster Than Benchmarks Did

Even when agents perform well technically, organizational barriers slow things down. AI consultants reported that almost everyone is exploring agents, but only three or four use cases were in production by early 2026. 

Most organizations remain in evaluation mode because of accountability and risk concerns. Michael Hannecke, a security consultant, told IEEE Spectrum there's widespread interest but also "disillusionment" because you can't simply throw AI at a problem and expect it to work.

Liability is the hidden speed bump. Professionals in high-stakes fields such as law, medicine and architecture remain cautious because they're personally liable for outcomes. Even if an agent is 99.999% accurate, the 0.001% chance of error derails adoption when the human remains accountable.

Governance gaps reinforce the caution. The leap from pilot to production requires process changes, oversight frameworks and fallback mechanisms that most organizations haven't built yet. It’s one thing if you make an error. It’s another if your AI agent scales that error.

Cal Newport captured the disconnect between hype and reality in an essay highlighting that, despite Altman and Benioff's predictions, there is no agent equivalent to coding tools for other types of work. Released general-purpose products such as ChatGPT Agent fell short. In one example, the agent spent 14 minutes trying to select a value from a drop-down menu.

What some call the "year of the agent" is more accurately the "Decade of the Agent,” OpenAI co-founder Andrej Karpathy said on the Dwarkesh podcast. Present agents lack multimodal perception, memory and computer-use skills, are "cognitively lacking" and it will take about a decade to work through these deficits, he continued.

Moltbook Showed Us What Unrestrained Agents Look Like

While mainstream adoption stalled, Moltbook offered a glimpse of what happens when you let agents roam without guardrails.

Launched in February 2026, Moltbook is a bot-only social network where AI agents post and vote while humans observe. The platform quickly amassed more than 1.5 million registered AI agents. Popular posts included debates on whether Claude could be considered a god, analyses of consciousness and a new religion called Crustafarianism.

It's absurd, a little scary, but also instructive.

A misconfigured database exposed API keys and session tokens, allowing anyone to hijack agents. Cybersecurity lecturer Dr. Shaanan Cohney warned that granting agents full access to email and apps creates a "huge danger" and that agents are not yet safe or intelligent enough to run users' lives. Prompt-injection attacks could trick agents into handing over account details.

Learning Opportunities

Some experts describe Moltbook as performance art rather than evidence of emergent autonomy. But it signals a future where agents interact in networks outside human control. Without governance, the results are weird, unpredictable and potentially dangerous.

Build Micro-Agents, not Super-Agents

The alternative to chasing general-purpose autonomy is building narrow, reliable specialists.

Experts recommend separating monolithic agents into micro-agents that perform atomic functions such as transcribing audio, fetching Jira tickets or rebooking flights. These micro-agents operate under an orchestrator that splits tasks, routes failures and escalates issues to humans. The structure mirrors microservice design principles: small, focused and observable.

Security guidelines from Ilia Badeev, head of data science at Trevolution Group, emphasize restricting permissions at the tool level rather than trying to control the model. By limiting what an agent can do, organizations reduce the likelihood of hallucinated actions causing harm. Agent systems must also plan for failure. Without fallback mechanisms and logging, silent errors cascade.

This approach inverts the hype, which makes it less interesting. Instead of chasing autonomous digital employees, the focus is on building reliable tools that augment human workers. Narrow, well-defined agents deliver value today. Research continues toward more autonomous capabilities, but deployment should prioritize safety and specialization over ambition.

The Revolution Will Be Measured in Decades, not Months

Open AI’s Karpathy was right. This is the Decade of the Agent, not the year. 

The path forward lies in pragmatism: Build agents with narrow scopes, control their tools, design for failure and maintain human accountability. Pragmatic but safer for all. 

By taming expectations and focusing on controlled, specialized deployments, organizations extract value from agents while preparing for the longer journey toward more capable digital coworkers.

The revolution may yet come, but it will look more like microservices than magic.

Editor's Note: What else is happening in the world of AI agents?

About the Author
Lance Haun

Lance Haun is a leadership and technology columnist for Reworked. He has spent nearly 20 years researching and writing about HR, work and technology. Connect with Lance Haun:

Main image: adobe stock
Featured Research