Microsoft's Copilot Tasks Has Keys to Your Inbox, Files and Accounts

Microsoft is no longer asking whether AI acts autonomously on a user's behalf. With the launch of a research preview of Copilot Tasks, it is finding out whether users will let it.

The feature accepts instructions in plain language, carries out multi-step jobs across websites, inboxes and third-party services and reports back when done. It asks for approval before taking consequential actions, such as sending messages or processing payments. For now, it is available to a limited test group, with a broader rollout planned in the coming weeks.

Autonomous AI systems are moving from research into product, and questions of reliability, privacy, safety and accountability that once seemed theoretical are now attached to something tangible. For co-founder and CEO Dominik Tomicevic of graph technology company Memgraph, the distinction between what came before and what Copilot Tasks represents is fundamental. "Most chatbots only describe what they would do," he said. "Systems like Copilot Tasks actually log into a browser and do it."

A New Execution Layer
The Unpredictable Web
The Agent Never Logs Off
The Audit Trail

A New Execution Layer

The technical foundation that makes autonomous action possible is the cloud-based browser environment in which Copilot Tasks runs. Rather than running on the user's own machine, the system works within a sandboxed cloud infrastructure that gives it persistent access to the web, regardless of whether the user's device is active.

This architectural choice is not incidental, said Milan Parikh, lead enterprise data architect at Cytel.

"Without this separation, autonomous execution of tasks at this level is simply impractical," Parikh said. The controlled environment lets Microsoft log every action, apply layered security measures and monitor behavior in ways that would be impossible on a standard consumer device.

That shift in architecture represents a boundary between two generations of AI, said Kseniya Kobryn, chief executive officer of Symphony Solutions. Systems that generate and suggest are one thing. Systems that operate and execute are another. Copilot Tasks delegates real-world digital tasks to an agent that acts, rather than advises.

The range of tasks the system performs reflects that ambition, Kobryn said. It monitors websites for new listings, scans and triages email, drafts and schedules replies, cancels unwanted subscriptions, compiles briefings from calendar and travel data and builds structured documents from raw notes, she said. Tasks can be set as one-off jobs or scheduled to recur.

The Unpredictable Web

The bigger question is how well the system works outside controlled conditions. The digital environments Copilot Tasks operates in are not standardized. Websites update their layouts, subscription services change their confirmation flows and email threading varies across clients. These environments weren't designed with autonomous agents in mind.

These variations complicate things unpredictably, Tomicevic said. "Letting an AI rerun natural-language tasks sounds great," he said. "But websites change, contents of inboxes change and little errors can quietly snowball into big mistakes if you don't have strong guardrails, context and continuous checks in place."

GraphRAG, retrieval-augmented generation built on graph structures, helps AI systems retrieve verified context about a user's digital world rather than operating on inference alone, and could help solve this problem.

The bigger problem is error detection, not task execution, Parikh said. A task may appear to succeed that actually failed because a checkout page changed or a confirmation step was skipped. "The difficulty is knowing when to stop and detect an error, versus silently performing a task incorrectly," he said. "Most systems currently cannot do that well."

For low-stakes tasks such as tracking apartment listings or unsubscribing from newsletters, this is a manageable risk. For anything involving communications, financial transactions or commitments made on a user's behalf, the consequences of silent failure are considerably more serious.

The Agent Never Logs Off

Unlike a search query or a one-off prompt, Copilot Tasks stays connected to inboxes, files, browsing activity and services. It is a standing arrangement that gives the system continuous visibility into a user's digital life, and the risks that come with it do not end when the task does.

The danger extends beyond data breach to scope creep, where access expands beyond what the user originally intended, Parikh warned.

"Users need fine-grained access controls, not a simple on/off switch," Parikh said. Most users will not read the privacy documentation carefully enough to understand what they are granting, he said

Granting an AI agent permanent access to email, documents and accounts is, from a security standpoint, like hiring a staff member with keys to everything, Kobryn warned. And the governance, role-based access control and continuous auditing features this requires aren't yet standardized.

The Audit Trail

The answer to many of these concerns lies in the quality of the reporting that accompanies autonomous action. Copilot Tasks produces an activity summary after each completed job, and the value of that summary extends beyond reassurance.

This transparency lets users verify what happened, identify mistakes and oversee actions. "Handing control to a black box is, outside low-stakes tasks, pretty risky," he said. Reports that capture not just what the agent did but why it did it provide the decision traces trustworthy autonomy requires.

Learning Opportunities

Webinar

Mar

Do Learning Programs Really Work? How to Turn Education Into Engagement In Healthcare

See how leaders are using learning programs to build trust with healthcare professionals and create measurable engagement.

Webinar

On demand

Fuel Your Front Line: Workforce Trends Shaping 2026

Get expert insight into the tools and tactics top frontline teams will need in 2026.

Watch Now

Webinar

On demand

Fix the Disconnect Between Projects and Profit

Your margins are hiding in disconnected data. Let's find them.

Watch Now

Webinar

On demand

AI in Customer Service: Faster Resolutions, Happier Customers

Don’t let rising demand burn out your team. See how to build a smarter, more resilient support org.

Watch Now

Webinar

On demand

Accelerating Healthcare Ops with AI

Watch Now

Webinar

Chameleon Tech: A chameleon changing colors to blend with various tech environments, symbolizing adaptability and versatility in technology solutions.

On demand

From Nice-to-Have to Non-Negotiable: Prove the Value of Your EX Stack

Transform your tech stack: prove its value, secure your future.

Watch Now

Webinar

Mar

Do Learning Programs Really Work? How to Turn Education Into Engagement In Healthcare

See how leaders are using learning programs to build trust with healthcare professionals and create measurable engagement.

Webinar

On demand

Fuel Your Front Line: Workforce Trends Shaping 2026

Get expert insight into the tools and tactics top frontline teams will need in 2026.

Watch Now

Webinar

On demand

Fix the Disconnect Between Projects and Profit

Your margins are hiding in disconnected data. Let's find them.

Watch Now

For example, a report stating only "subscription managed" is insufficient, Parikh said. Users need enough detail to understand what was done and how to reverse it. That combination of legibility and reversibility changes an audit trail from a compliance feature into trust infrastructure.

The broader picture that emerges is of a product that is technically serious, directionally significant and operating at the edge of what current AI systems reliably deliver. Copilot Tasks represents an advance in consumer AI . Whether it earns the trust required to act unsupervised on what matters most depends on what the research preview shows when users stress-test it.

Editor's Note: How else are technology vendors putting agents into action?

Claude Cowork Is a Productivity Test Enterprises May Fail — Claude Cowork can organize files and write reports on its own. Enterprises still have to figure out who’s responsible when it gets things wrong.
Microsoft Launches Copilot Cowork, Built on Claude, to Execute Tasks Across Microsoft 365 — Microsoft launches new autonomous agent to execute tasks across M365, alongside a new $99/month E7 tier bundling Copilot, identity tools and agent management.
Your AI Workflows Will Outlast the Leaders Who Approved Them — AI agents don't create accountability problems, they inherit them. When autonomous systems outlast the teams that built them, ownership disappears.

Microsoft Copilot Tasks Stops Talking and Starts Doing

Table of Contents

A New Execution Layer

The Unpredictable Web

The Agent Never Logs Off

The Audit Trail