two colleagues sitting uncomfortably close to one another, giving each other the side eye
Feature

Why Nobody Agrees on AI Productivity Metrics

5 minute read
David Barry avatar
By
SAVED
As AI reshapes work, companies want new productivity metrics. The problem? Nobody can clearly define or fairly measure the alternatives.

A consensus has emerged among workplace consultants and HR technology vendors: Traditional productivity metrics are broken, AI has rendered them obsolete and companies need to replace measurable outputs with something more meaningful.

The only problem is nobody can explain what that "something" should be or how to measure it.

Before Measurement Fails, Meaning Does

"AI didn't break traditional performance metrics; it simply exposed how outdated and intellectually lazy they were," said Luke Tobin, chief executive at Unusual Group. But if measuring hours worked and output volume is intellectually lazy, what makes measuring "decision quality" or "AI supervision scores" any more rigorous?

The argument against traditional metrics follows a familiar pattern. Traditional performance metrics center on individual output, but with AI in the creative mix, output volume becomes virtually unlimited, Trailhead Communications founder Barbara Roos told Reworked. Anyone can drop an assignment into ChatGPT and get instant results. "But is that result good? Is it unique? Will it have the desired impact?" she asked.

If output is abundant, output volume loses meaning. But this is where the first problem emerges: the proposed replacements are conceptually unclear.

Companies must foster a culture of experimentation, sharing and collective growth — measuring and rewarding creativity, innovation, collaboration and sharing rather than traditional productivity, Roos said. Managers must critique thinking rather than output, teaching employees to think like managers themselves. It's an appealing vision of workplace transformation, but it raises the question of how to measure "collective growth" in a performance review.

Before performance can be measured differently, it must be described precisely. Terms like judgment, creativity and collaboration sound meaningful, but without shared definitions, they resist consistent interpretation.

When Performance Measurement Can't Scale

Even if we could clearly define these concepts, a second problem quickly appears: measurement.

Traditional performance measurements such as hours worked or output volume don't reflect value when AI handles transactional tasks, said Rosalie Hawley, chief people officer at Medius. "What matters now is how employees leverage AI to make better decisions, solve problems and collaborate effectively," she said. 

Yet when pressed on specifics, the alternatives remain frustratingly vague. Medius measures engagement and performance every three to four months through surveys and feedback loops, focusing on meaningful contribution rather than task completion. But meaningful contribution, like beauty, lives in the eye of the beholder.

Productivity shouldn't be measured by hours logged but by business outcomes, such as streamlining processes, driving revenue and using resources better, said Sam DeMase, career expert at ZipRecruiter. "In an AI-driven workplace, traditional performance metrics don't apply and can actually penalize the best workers," she said. But "better resource utilization" is the sort of management-speak that can mean almost anything, from efficiency improvements to doing more work in less time.

Measurement systems only function when they can be applied consistently across teams, managers and time. Metrics that rely on interpretation rather than definition cannot be reliably compared or defended.

Human-AI collaboration should be evaluated by how much better outcomes are with human oversight, DeMase said. The most meaningful metrics now track decision quality and risk reduction. "The most valuable employees are the ones stopping AI from screwing things up, and almost no companies are measuring that," she said. While this is insightful, it reveals the core problem: If almost no companies are measuring it, perhaps it's because measuring "preventing AI mistakes" is hard to quantify.

When Performance Metrics Create the Wrong Incentives

When metrics fail, it has an impact.  

Most companies still track output like it's 2010 — hours logged, tasks completed, boxes ticked — but these numbers lose meaning when AI does the heavy lifting, Bondex CEO and co-founder Ignacio Palomera told Reworked. "You're measuring busyness, not impact," he said. His proposed alternatives include prompt quality and iteration value, decision impact vs. effort and AI supervision scores. The future of measurement is about judgment, collaboration and meaningful impact, not how fast you ship a draft, he said.

But "prompt quality" is at least as subjective as "output quality." An AI supervision score sounds scientific until you try to define what it measures. Does it track how many AI-generated errors an employee catches? How much they improve AI outputs? How efficiently they delegate to AI vs. doing work themselves? Different definitions create perverse incentives.

Hawley identified adaptability, influence and storytelling as the metrics that matter now, describing them as translating insights into value-driving decisions. It's an inspiring vision. It's also the sort of framework that could easily reward the articulate over the effective, the politically savvy over the productive.

When performance criteria cannot be verified, employees are incentivized to prioritize visibility over outcomes.

When Performance Metrics Become Managerial Discretion

The risks of maintaining traditional performance metrics, according to these experts, are severe. Roos warned of shadow AI usage, talent erosion as creativity and critical thinking decline, bad scaling decisions and cultural mistrust. Hawley suggested that teams will feel frustrated or disengaged if their strategic, creative and human-centered work isn't recognized. Tobin argued that legacy performance metrics reward effort over results, making people who use AI most intelligently look less productive — a fast way to lose top talent.

These concerns are real. But they assume that the alternative — measuring vague concepts such as "influence" and "storytelling" — will be both feasible and fair.

Companies should focus on how well employees train AI using institutional knowledge, time to completion, process improvement via strategic AI usage and increased time spent on high-value human-centered work, DeMase said.

This sounds comprehensive until you consider the measurement challenges. How do you quantify "how well" someone trains AI? What constitutes institutional knowledge vs. experience? When does mentoring others count as high-value work vs. not doing one's own job?

Learning Opportunities

At this point the issue is no longer metrics. It's governance.

When Metrics Become Aspirations

Roos offers perhaps the most concrete implementation suggestion: hold sharing huddles, give shout-outs to people who tried something new and shared learnings when they failed and reward how well and how quickly employees learn from AI trial and error. These are reasonable cultural interventions, but they're not metrics.

Perhaps the entire framing is backwards. Traditional productivity metrics weren't broken, but incomplete. Hours worked and output volume were never meant to capture all of someone's contribution, but baseline accountability.

Once AI automates output, measuring individual contribution at task level becomes pointless, Tobin said. If your definition of productivity still fits in a spreadsheet, you're already behind, he continued. The future of performance management isn't about productivity but responsibility. It's a clever rhetorical move, but responsibility without measurable outcomes is just another word for trust — and trust doesn't scale in organizations of thousands.

The risk isn't that companies will stick with outdated metrics, but that they'll abandon imperfect but measurable standards, for aspirational concepts that sound progressive but prove impossible to implement fairly. "Decision quality" and "adaptability" are real attributes that matter. But in the absence of clear measurement frameworks, they become vehicles for bias, favoritism and subjective judgment masquerading as sophisticated evaluation.

Leaders should assess adaptability, resilience, judgment and collaboration with AI, measuring success by the higher-value work AI frees people to achieve, Hawley said. But who decides what counts as higher-value? Who determines whether someone's judgment is sound? 

These aren't trivial questions — they're the difference between performance management and managerial discretion dressed up as data-driven evaluation.

Editor's Note: Read more about the changing world of productivity measurement and performance metrics:

About the Author
David Barry

David is a European-based journalist of 35 years who has spent the last 15 following the development of workplace technologies, from the early days of document management, enterprise content management and content services. Now, with the development of new remote and hybrid work models, he covers the evolution of technologies that enable collaboration, communications and work and has recently spent a great deal of time exploring the far reaches of AI, generative AI and General AI.

Main image: unsplash
Featured Research