yikes, it's a mime, doing mime-y things. In this case, pointing into a void
Feature

AI Is Turning Work Into a Performance Metric

6 minute read
David Barry avatar
By
SAVED
From token counts to meeting summaries, enterprise AI is reshaping incentives, rewarding what’s measurable, not what actually moves work forward.

Enterprise AI was supposed to make workers more productive. In some cases, it has. But it has also created new incentives to look productive rather than create value.

Productivity theater takes many forms. As workers quickly learned with generative AI, it excels at exactly what its name promises: generating. Employees can easily produce presentations, reports, drafts and updates at a visible and measurable volume. Whether that volume translates into value is a different question. Once people understand AI is tracking specific metrics — such as ticket completion time, code commits or document production — they can easily game the system, again, with questionable value created.

Meeting transcripts create their own incentives. When AI-generated summaries become the record of contribution, employees learn to make visible, quotable interventions that will appear in the recap, whether or not they meaningfully advance the discussion. Tokenmaxxing is probably the most visible (and expensive) version: engineers maximize their use of AI tools with the knowledge that AI consumption — not the end products — are being monitored. Managers see only the throughput.

This recently came up at Amazon. Staff were running the company’s internal AI tool on trivial tasks to inflate their token counts, with one employee telling the Financial Times: “There is just so much pressure to use these tools.” Amazon has since confirmed the leaderboard is gone, describing it as a “beta dashboard” that “was not a formal or approved tool.”

You do not deprecate something that was working.

At Meta, an employee-built leaderboard called “Claudeonomics” ranked the company’s roughly 85,000 workers by token consumption and logged more than 60 trillion tokens in a month before being taken down.

Analysts have called it a textbook case of Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure. When companies measure AI adoption by output volume rather than outcome quality, they teach workers to optimize for the metric rather than the work.

Roughly three-quarter (74%) of U.S. employers now use online tracking tools, and the global employee monitoring market is projected to nearly triple by 2034. The tools are growing faster than the answers about what they should measure.

Table of Contents

When Metrics Become the Job 

“The deeper risk is not that employees game the system,” said Louisa Loran, executive advisor and author of "Leadership Anatomy in Motion," who has led AI transformation at Google and Maersk. “Humans have always adapted to incentives. The deeper risk is that organizations quietly redefine what ‘good work’ looks like around what AI systems can easily capture, summarize and reward. That changes behavior faster than most leadership teams realize.”

In the case of meeting transcripts, contributions become cleaner and more strategically timed. Exploratory thinking, which rarely lands well in a transcript summary, gets edited out. “Employees learn to speak in quotable fragments rather than unfinished ideas,” Loran said. “The organization still feels productive. Dashboards move. Summaries populate. Adoption metrics rise. But beneath it, independent reasoning, constructive disagreement and intellectual risk-taking may quietly be thinning.”

“People restate decisions for the recorder, claim credit so the notetaker attributes it to them and stop disagreeing out loud because pushback now gets summarized to leadership,” agreed Louis Beaumont, founder of Screenpipe (YC S26), an open-source platform building context-aware AI agents. “The honest part of the meeting moves to DMs and hallways the AI never sees. Most managers don’t notice the candor left. They just see a cleaner summary and trust it, which is exactly backwards.”

That dynamic has a structural explanation. New data from Cornerstone OnDemand, drawn from a survey of 2,000 workers in the U.S. and U.K., found that 47% are learning AI through trial and error, 36% limit their use to avoid making mistakes and 17% say they sometimes pretend to use AI. Workers are not gaming the system out of laziness. They are optimizing for safety under unclear expectations.

AI Accelerates an Old Management Mistake

Vendors aren’t helping. “More prompts does not equal more value,” said Laura Stash, executive vice president at iTech AG and a senior voice in enterprise AI adoption. "More AI-generated output does not mean better outcomes. We learned the same lesson with emails, meetings, PowerPoints, Jira tickets and every other productivity trend before this.”

What has changed is the speed and resolution of the feedback loop. “What’s interesting is how quickly behavior adapts once employees realize the AI summary is what executives actually read,” Stash said. “People start speaking for the transcript instead of the room. You can already see meetings shifting toward ‘How do I sound in the recap?’ instead of ‘Did we solve the problem?’”

Tools being sold to address this are measuring the wrong thing. “Hours captured, meetings summarized, percent of time in approved apps — none of that is an outcome,” Beaumont said. “The real study — did the team ship more, did cycle time move, did the monitored group beat the unmonitored one — is slow and risks proving the tool did nothing, so almost nobody runs it. Output is a screenshot. Outcome is a study. The category sells the screenshot.”

Documents tell the same story. A Smallpdf study of professional workflows found that 16% of all professional documents are abandoned before they are used, shared or finished, costing professionals an average of 5.7 hours a month in wasted effort. AI has not closed that gap. In many deployments it has papered over it with well-formatted output that looks like progress and goes nowhere.

This persists because the people best positioned to flag it are the people with the most to lose by doing so. “A green dashboard that goes up is safer than an honest one that goes flat,” Beaumont said. “You don’t audit your own purchase order.”

Goodhart's Law Comes for AI

The token leaderboard is this year's version of the mouse-jiggler. When remote monitoring tools first tied cursor activity to performance, workers responded with devices that simulated movement. The vendor response was escalation: keystroke logging, screenshots, webcam attention tracking and now AI activity classification. Each iteration assumed the problem was an insufficiently sophisticated proxy.

“AI monitoring is mouse-jiggling 2.0 with a nicer dashboard,” Beaumont said. “Same mistake, higher resolution.”

Organizations have always struggled to separate visible work from valuable work, Stash said. The likable employee who manages perception gets promoted while the quieter one producing work gets overlooked. AI did not create that dynamic, but it made the signals digital, faster and harder to argue with. A faked cursor is easier to spot than a well-optimized transcript summary.

Stop Scoring Workers – Measure Value  

None of this means organizations should stop measuring AI adoption. Leaders need to know whether employees are using the expensive tools and if they improve performance. The problem begins when usage becomes the outcome.

Companies navigating this well measure differently. The strongest enterprises are becoming sharper about what uniquely creates value for them, using AI to reinforce and scale that advantage rather than substituting it with generic output, Loran said. “The challenge is not simply deploying AI,” she said. “It is ensuring technology strengthens human judgment rather than gradually teaching people to perform contribution instead of creating it.”

Learning Opportunities

AI usage needs to evolve toward something closer to operational accounting, tied to projects, outcomes and measurable business value rather than activity dashboards, Stash said. “Companies that get this right will stop measuring AI activity in isolation and start tying AI-assisted work to actual operational outcomes. Did it speed up delivery? Improve customer experience? Reduce rework? Help teams make decisions faster? That’s the part that matters.”

Even the configuration of individual tools changes the behavioral equation, Stash said. “If summaries reward who talked the most, people will optimize for visibility,” she said. “If summaries focus on decisions made, action items, blockers and ownership, people optimize for moving work forward.”

Stop scoring the worker and start measuring the work, Beaumont recommended. Capture the context of how work happens, not output volume, to find the bottleneck, document the process, automate the toil and calculate the time. “The second people think capture is being used against them, you’re back to jigglers within a week,” he warned.

The Cornerstone data suggests we’re running out of time for getting this right. When nearly one in five workers is already pretending to use AI, performance theater has become part of the workflow. Once organizations define "good work" by what AI can easily count, summarize and rank, employees adapt accordingly. The question is no longer whether workers will optimize for the metric. The question is whether leaders still know how to recognize value when it doesn't show up on a dashboard.

Editor's Note: How else are organizations encouraging employees to create real value with AI?

About the Author
David Barry

David is a European-based journalist of 35 years who has spent the last 15 following the development of workplace technologies, from the early days of document management, enterprise content management and content services. Now, with the development of new remote and hybrid work models, he covers the evolution of technologies that enable collaboration, communications and work and has recently spent a great deal of time exploring the far reaches of AI, generative AI and General AI.

Main image: adobe stock
Featured Research