the shadow  of  an arm stretching  out

Don’t Be Afraid of the Dark: Bring Dark Data Into the Light

December 17, 2020 Information Management
Andrea Malick
By Andrea Malick

While the phrase dark data may sound like a special ops clandestine operation, the truth is, we all have dark data. A lot of it is left in the dark simply because it’s unstructured, which leaves a lot of untapped value hidden away. As content managers, we are stewards of this information lurking in every organization and are on call to bring it into the light.

We Are Dark Data Machines

According to, over 80% of all data is dark and unstructured. This figure is projected to grow to 93% by 2022, with the arrival of more devices generating data, such as internet of things (IoT) sensors and next generation cars.

Dark data, both structured and unstructured, is created in the course of everyday business processes by machines and people. It’s data that users either ignore as unimportant or aren’t even aware is there. And you can’t turn it off. Without realizing it, we’re all perpetual motion machines hard at work growing a dark data vortex.

Here are some examples of everyday dark data:

  • Spreadsheets (there are millions of spreadsheets in a typical enterprise).
  • Multiple old versions of documents (some systems retain hundreds of versions of one document over the course of its lifecycle).
  • Email attachments.
  • Text documents.
  • Social media posts.
  • Images, videos and rich media (e.g., surveillance video data).
  • Records of conversations with clients or business partners.
  • Metadata that travels with a piece of information as context (e.g., logs in a website, time of day that a tweet was sent, geolocation data).

So, dark data is in that company folder named ‘George’ that everyone scrolls past every day, not knowing its contents (even though George left the company 3 years ago). And it’s in the system logs quietly recording activity year after year.

Related Article: Shine a Light on Your Dark Data

Dark Data: Untapped Benefits and Lurking Risks

There are several compelling arguments for analyzing and using dark data. The International Digital Corporation estimated that organizations could recover $430 billion in productivity gains over their competitors if they extracted the insights within dark data. There are so many opportunities in dark data to understand your customers and how they’re experiencing your service.

By extracting key information from dark data (i.e., adding structure to unstructured data), an automated routine can recognize and process customer transactions and requests. By applying analytics to dark data, businesses can uncover patterns in users’ behaviors that will improve allocation of resources.

Dark data also presents risk. If you don’t know what you have, you can’t protect it and manage it. All data — dark and otherwise — is subject to privacy laws (like the GPDR and CCPA) and therefore must be governed. How would you even know if sensitive information in your dark data has been breached, or if keeping it beyond its retention period is exposing your organization to costly liabilities?

Information management specialists are already kept busy bringing relief to records handling and legal discovery practices for the data they know about. However, organizations are going to need a bigger project for the dark data they don’t know about. But the effort is worth it, and more solutions are emerging to allow organizations to explore this complex and vast frontier.

Related Article: Dark Data as a Company Asset

Your Dark Data Walking Papers

Here’s how to begin to tackle the dark data residing within your organization:

  • Include dark data in a holistic data strategy. Dark data should be a part of the enterprise information audit and classified to recognize its importance. Not all data is useful or needs to be governed. Scanning and classifying it will help sift potentially valuable data from the rest.
  • Manage how data is created and ingested. Certain types of dark data cannot be turned off, but organizations can examine how data is brought into their systems and databases and apply data management best practices to ensure it is tagged, tracked and handled throughout its lifecycle. 
  • The governance process should establish rules for what data is useful and important. It should also establish appropriate ways to protect it, manage it and monitor it for compliance.
  • Apply analytics tools and machine learning to recognize patterns and calls to action from unstructured and dark data. Technologies are emerging that can both extract keywords and meaning in unstructured content, as well as automate invoicing or orders.
  • Open some of the dark data to everyone to analyze. Open Data and Open Government movements mean recruiting the broader communities of citizen data scientists to contribute to the analysis work and find new stories locked within the data.
  • Invest in data literacy. Data literacy is often overlooked, but is critical step in building a data-driven culture, where everyone in the enterprise recognize dark data, its potential, and their responsibilities for proper data management. Clients often tell me they want to do the right thing but don’t know what that is for their organization. Through thoughtful change management and clear, realistic policies and guidelines, you will build a community of information stewards.

There are surprises in the dark. Looking squarely at what’s there and applying good information management with the help of some exciting new technologies, may result in an incredible amount of value for the organization.

Maybe I should change the tagline on my business profile to Dark Data Manager …

About the Author

Andrea Malick is a Research Director in the Data and Analytics practice at Info-Tech, focused on building best practices knowledge in the Enterprise Information Management domain, with corporate and consulting leadership in content management (ECM) and governance.

Andrea has been launching and leading information management and governance practices for 15 years, in multinational organizations and medium sized businesses.


Featured Research

Related Stories

line up of available machines in a public laundromat

Information Management

Records Management Needs a Refresh

warning sign: thin ice

Information Management

Are Your Risk Assessments Reliable?

small dog carrying a big stick

Information Management

Right-Size Your Enterprise Content Management Pilots

Digital Workplace Experience Q4: October 12-13, 2022

DWX22 - Q4