spring cleaning: a hand in a rubber glove holding a spray bottle
Feature

Time to Do a Spring Clean of Your Digital Document Silos

6 minute read
David Barry avatar
By
SAVED
One of the easiest ways to get your document silos up-to-date is dumping what you don't and won't need. Here's why.

Generative AI bots can find and retrieve documents, saving time and headaches. So read a recent Amazon blog post.

“Enterprise document management systems (EDMS) manage the lifecycle and distribution of documents,” the blog reads. “They often rely on keyword-based search functionality. However, it increasingly becomes hard to discover documents as such repositories grow to tens of thousands of items."

The blog was written to highlight the intelligent search bot Amazon Web Services (AWS) built on top of the document repository of a global life sciences company. What the blog didn't suggest was tackling the root problem — namely properly managing the tens of thousands of items that can be found in any enterprise, in many cases documents that are out of date, no longer needed and are eating up resources, both in storage costs and in worker time spent searching. 

A Band-Aid Solution to an Old Problem

While AWS's solution may be new, the challenge is old. Businesses have long struggled to find the right documents for the right processes, in spite of having some kind of document management system in place.

However, using bots to retrieve content is a tacit admission that either the document management system is not being used correctly, or that documents are being thrown into silos where they have no business in the first place.

While generative AI-driven bots may act as a band-aid solution, a better and more effective method must surely lie in improving document management strategies in the first place. If documents are placed where they should be and disposed of in a timely manner, then workers should in theory find them will little difficulty.

There is obviously quite a discussion to be had about the failure of business technology strategies to address this, but the reality for many companies is that their document siloes are disorganized, badly managed, with poor cataloguing making it close to impossible to access documents needed to get work done.

Related Article: The State of Document Management in the Workplace

Too Many Content Systems Falling Short

You do not have to go far to see where the problem lies either. Last year the Association for Intelligent Information Management (AIIM) 2023 State of the Intelligent Information Management Industry report more than adequately demonstrates what and where the problem lies.

The report combined critical benchmark data from 2013 and 2018 with data from 2023 to find that many organizations are still tech-heavy with organizations using an average of 4.95 content systems now as opposed to 3.14 in 2013. Nearly 15% are using between seven and 10 content systems and in 75% of cases, these systems were not integrated.   

The problem is aggravated by the fact that for 80% of the organizations, their enterprise content management system, or document management systems are key technologies at the heart of the digital workplace. Other tools cited include content services, process automation and intelligent document processing and capture.

Our own State of the Digital Workplace report confirmed the integral role document management plays in an organization, with document management rated the most important technology in the workplace by 61% of respondents. Yet only 35% reported their document management system "worked well."

Related Article: AI Is Changing How We Handle Documents

Clean Out Your Document Silos!

At no point in researching this article did any of the reports suggest cleaning out document siloes and getting rid of out of date, irrelevant or unnecessary documents. Given the potential security implications of keeping such data and the sheer weight this digital workplace flotsam represents, it is unclear why organizations do not conduct a regular spring clean.

Freeing up space and getting rid of unwanted content should be part of an organization’s business strategy.

"When developing a strategy to maintain document relevancy, successful organizations have specific policies and processes in place for content management," Kathy Rudy, chief data and analytics officer with technology research and advisory firm ISG told Reworked. "These polices set the groundwork for document governance, retention, access and security.”

The first task, she said, is defining what content is relevant and for what application/purpose. Next up is a review of content locations across the organization – where document silos are located, where and how they interact with other systems, where they interact with records management, if at all.

After that you need to establish data currency and retention, by document and content type. Create a methodology to classify documents, often referred to as document tagging, which you can use AI to enable.

"There is the notion that tagging is not required with advent of generative AI, which can be true in some cases, but if you have millions of documents, tagging will speed up the process of finding relevant content and keep costs down,” she added.

Related Article: Can Auto-Tagging Save Us From Metadata?

Do You Know Why You Stored the Documents in the First Place? 

Any strategy that depends on an organization's staff to succeed is going to fail, Laurence Hart, director of consulting services at CGI Federal, told Reworked. 

The most you can ask for is getting people to put content into the right repository. Tagging and consistent naming is not going to happen at scale, he said.That is where machine Learning (ML) comes into play.

Learning Opportunities

Tools are available to analyze document stores to weed out ROT — redundant, obsolete and trivial information. Most come with some default learning, and organizations can train them to fully understand the content in their ecosystem.

With the initial cleaning under your belt, it's time to set up a scheduled routine for maintenance. Communication is key here to prevent the process from happening in a silo. Tell people what is happening and give them the opportunity to prevent the accidental removal of misidentified information.

Mulytic Labs CTO David Johnson said organizations need to understand first and foremost why a document is even in a silo — or still in a silo — in the first place.

He advises users not to include documents for document’s sake. He said organizations need to collect documents and the data they contain for a reason. It is important to understand and write down that reason. "Be sure you are answering a question of the business,” he said.

All document and data collection must point back to a business question being answered. Writing it down is critical because data and systems will change, but the business problems will not. So, when new systems are introduced, organizations can figure out what to collect and what to avoid. When business objectives or goals change, then you can back and remove anything that does not fit the new goals.

Related Article: The Secret Sauce Behind Project Cortex: Good Metadata

Building Your Document Strategy

Organizations can take several practical steps to help themselves, said Adhiran Thirmal of Security Compass. Proper maintenance of documents is crucial for maintaining efficiency, security and compliance. Developing a strategy to achieve this involves several key elements, including:

Data Mapping and Inventory

The first thing is to understand what the organization has, Thirmal said. Conduct a comprehensive data mapping exercise to identify all document repositories (cloud storage, shared drives, local servers, etc.). Then, create an inventory of the content within them, categorizing it by sensitivity, department and purpose.

Information Governance Framework

Next up is to develop a clear information governance framework, he continued. The framework defines policies for data retention, access control, classification and disposal. It ensures sensitive documents are properly secured, while streamlining access to non-sensitive ones.

Content Lifecycle Management

After that, organizations need to implement a content lifecycle management (CLM) system to automate tasks like version control, document review and archiving based on pre-defined rules. CLMs reduce human error and ensures documents do not linger in insecure locations.

Thirmal suggested further areas for consideration when building out a document strategy, including:

User Training and Awareness

Educate employees on the importance of information security. Train them on data classification protocols, secure storage practices and the dangers of shadow IT (unauthorized cloud storage).

Access Controls and Permissions

Implement granular access controls. Users should only have access to documents relevant to their job function, he said. Regularly review and update access permissions to prevent unauthorized access.

Data Encryption

Encrypt sensitive documents at rest and in transit. This adds an extra layer of protection in case of breaches.

When it comes to the technology that supports all of this, Thirmal recommended a combination of solutions including:

  • Content Management Systems (CMS) — Systems such as SharePoint, Documentum or Alfresco provide robust features for content organization, version control, and access management although their use will depend on the needs of the workplace.
  • Data Loss Prevention (DLP) Solutions — DLP solutions help identify, monitor and protect sensitive information from unauthorized disclosure or exfiltration.
  • Data Classification Tools — Automated data classification tools such as Titus, Varonis or Boldon James assist in classifying and tagging content based on predefined policies and rules.
  • Blockchain Technology — Blockchain-based solutions offer immutable and tamper-proof record-keeping capabilities, ensuring the integrity and authenticity of critical documents, said.

“By implementing a comprehensive strategy with the right technology and a focus on continuous improvement, organizations can turn their document silos from security vulnerabilities into valuable assets that drive secure and efficient collaboration,” Thirmal said.

About the Author
David Barry

David is a European-based journalist of 35 years who has spent the last 15 following the development of workplace technologies, from the early days of document management, enterprise content management and content services. Now, with the development of new remote and hybrid work models, he covers the evolution of technologies that enable collaboration, communications and work and has recently spent a great deal of time exploring the far reaches of AI, generative AI and General AI.

Main image: JeShoots | unsplash
Featured Research