close up of a justice is blind statue on the ground
Feature

Why Digital Provenance Should Be on Every CTO's Radar

5 minute read
David Barry avatar
By
SAVED
A global provenance crackdown is forming. Gartner warns billions in potential sanctions ahead — yet most companies can’t verify their digital assets.

Here's a question that should keep every CTO awake at night: Can you verify where your digital assets came from? For most organizations, the answer is no. Third-party software, open-source libraries and AI-generated content now form the backbone of enterprise operations, yet their origins have become increasingly opaque. It's a blind spot that's about to get expensive.

Gartner is predicting that by 2029, organizations that fail to invest in digital provenance capabilities will face sanction risks potentially reaching billions of dollars. This isn't fearmongering but rather the inevitable result of a regulatory wave already crashing over the industry.

Open-source software now makes up 70% to 90% of any given software package, and supply chain attacks have been surging at an average rate of 742% per year.

Meanwhile, the White House's 2021 Executive Order already mandates Software Bills of Materials (SBOMs) for federal software purchases, and the EU's Cyber Resilience Act requires manufacturers to produce comprehensive software SBOMs.

Solutions are emerging. Technologies such as SBOMs, attestation databases and digital watermarking help organizations track and validate components across their supply chains.

But this won't be cheap or easy. It requires new infrastructure, overhauled processes and a culture shift. With billion-dollar sanctions on the horizon, the question isn't whether your organization can afford to implement digital provenance. It's whether you can afford not to.

Metadata Is the New Foundation

The path to provenance starts with something that seems simple: tracking metadata. 

"The foundation of digital provenance is data governance, and the first step in governance is tracking metadata," said Russell Fishman, global head of solutions product management for AI, virtualization and modern workloads at NetApp. "In a world where businesses generate and process increasingly massive amounts of information, data about the data is becoming more important than the data itself,” he said. 

But not all metadata is created equal. "Metadata enriches data, but how it is enriched makes a big difference in the outcome," Fishman added.

Enterprises need systems that create and track active metadata throughout the data's lifecycle. Metadata infrastructure helps enterprises understand when data was retrieved, trace the lineage from source to output, block sensitive data from unauthorized uses and keeps AI outputs compliant.

The catch? This only works if enterprises have built their data infrastructure with governance in mind from the start, Fishman said. Gartner predicts that 60% of AI projects will fail if they aren't supported by "AI-ready data," which requires governance and active metadata.

"Collecting data without metadata makes it impossible to track the origins and changes along the way," Fishman warned. "Enterprises can't rely on a data broker to catalog and track data after the fact."

The Biggest Provenance Blind Spots

Organizations are also drowning in data, and most have no idea where it came from. "Most teams don't realize how much of their digital environment rests on endlessly copied assets and datasets with no clear history,” said Neil Ormesher, CEO of Accounts and Legal. "We've tolerated that uncertainty; regulators and users won't."

Organizations in healthcare, finance and retail create and store thousands of records and digital assets daily, making metadata recording and tracking a monumental challenge. But the dangerous blind spots are newer.

"AI-generated content and third-party productivity platforms are one of the biggest blind spots in digital provenance today," according to Ormesher. Many organizations generate assets using large language models trained on external data and use software built on open-source code.

"The pressure is highest on code paths and model outputs, because those are the pieces that affect customers when something breaks,” said Nuha Hashem, CTO and co-founder of Cozmo AI.

AI’s Provenance Challenge

AI-generated content doesn't just complicate provenance — it potentially breaks it. AI platforms produce content trained on existing data, creating articles, code, artworks and music based on authentically crafted pieces.

Output appears original, yet may have identifiable similarities to works it trained on. The result is a growing number of alleged copyright infringement cases. Businesses using AI platforms often mistakenly believe content is original, but could face sanctions for failing to provide relevant metadata when digital provenance becomes mandatory.

"Provenance is simple: Prove where something came from and what's happened to it since," said Ormesher. "Not with guesswork, but with verifiable evidence." Visual media represents the most urgent example.

"Whilst we once trusted that a photo captured a real event, with synthetic images now indistinguishable from reality, we must prove the origin point of the images that we see," Ormesher said.

But there's a misconception: "I've seen the same pattern across newsrooms, enterprises and regulators: People assume metadata equals provenance," Ormesher said. "However, a bad actor can easily strip or forge metadata."

The Provenance Cultural Shift

Mandatory digital provenance will change how organizations operate. Documenting metadata for every digital asset created and managed could slow down productivity and affect growth. Smaller businesses may lack the resources to implement compliant tracking software. 

Learning Opportunities

Regulatory drivers are already in motion. Governments worldwide are introducing frameworks such as the European Union's Digital Product Passport to verify the lifecycles of digital products and require greater transparency in the datasets AI uses to train. Failure to prove the integrity and origins of data could result in billion-dollar regulatory sanctions.

Misused data in industries such as healthcare, finance and manufacturing could harm health and lifestyle, potentially resulting in costly legal cases. 

But the deeper challenge is cultural. "Organizations may require a significant cultural shift beyond technology to ensure compliance with any mandatory digital provenance frameworks," warned Ormesher.

One shift could be in authenticity, with supervisors verifying all data and assets for validity. Anything created or modified will need to include timestamps, digital signatures, permissions and ownership information.

"Provenance only works when it becomes a habit, not a cleanup step,” Hashem said. "People need to assume that anything created inside the company might be reviewed later. Once that expectation is clear, the work becomes easier to explain and the decisions behind it become easier to trust."

The technology itself is catching up, but remains fragmented. Distributed ledger technologies such as blockchain record metadata amendments that are difficult to amend once stamped. However, the inflexibility of these technologies makes them challenging to implement broadly.

"The most helpful tools are the ones that record information at the moment work happens," noted Hashem. "They cut down on guesswork because the system captures the step before anyone has to write it down later. Older systems struggle with this because they were not built to hold a detailed path from input to result."

Most solutions cover only pieces of the problem and don't integrate with workflows. That's why Ormesher built OpenOrigins: provenance that embeds into existing systems and tracks content across its lifecycle.

"Once teams default to transparency, provenance stops being a burden and becomes an advantage," Ormesher said.  "You move faster because you actually trust the information in front of you."

What Organizations Should Do Now

The writing is on the wall: regulations for digital provenance are on the way and sanctions for non-compliance will likely be severe. Whether this happens on Gartner's 2029 timeline or even earlier, the outcome is the same: Organizations need to start building data infrastructure with governance integrated from the ground up.

The cultural transformation is equally important. Teams must adopt provenance as a default practice rather than a compliance checkbox. Assets need traceability from the moment of creation, and teams need tools that make this happen automatically.

When mandatory digital provenance regulations take effect, organizations may find it difficult to prove that data is authentic if it is found to breach regulations.

The stakes are too high to treat this as a future problem. Digital provenance isn't just about avoiding potential sanctions. It's about building systems and cultures where trust is verifiable, not assumed.

Related Articles:

About the Author
David Barry

David is a European-based journalist of 35 years who has spent the last 15 following the development of workplace technologies, from the early days of document management, enterprise content management and content services. Now, with the development of new remote and hybrid work models, he covers the evolution of technologies that enable collaboration, communications and work and has recently spent a great deal of time exploring the far reaches of AI, generative AI and General AI.

Main image: Wesley Tingey | unsplash
Featured Research