Do You Know What Data You Have and Where It’s Hiding?
Do you understand your organization's datasphere? The topic comes up repeatedly as an important first step to take to understand the current state of your organization before progressing on a project or an initiative.
What Is a Datasphere?
But what do we mean by datasphere? Datasphere is a term coined to quantify the amount of data created across the world in a given year. Clearly your organization will not have to tackle the amounts of data created, copied, modified, transferred and stored across the whole globe, so for the purposes of this article let's constrain the scope to refer to all the data owned by an organization. An organization's datasphere will contain structured data, that which we normally consider to be managed in the rows and columns of databases, or apps that sit on top of them; and unstructured data, or what we often refer to as information, which comes in the form of documents, videos, audio files, CAD drawings or other files stored in file shares, content management or document management systems.
Whether your organization talks about data, information or even knowledge assets, all those electronic files, including the databases (data warehouses and data lakes) are encompassed by the datasphere.
I am not sure if an academic has come up with a law to describe the ever-expanding datasphere, but as most of us understand that the creation of new data far, far outstrips our ability to delete old unused data from our systems.
If an organization has an information governance team or a records manager, chances are they've developed a policy and procedure that encourages them to delete information on a regular, scheduled basis. Such a policy can be based on business risk, or regulatory risk factors and the need for defensible disposition, which is the requirement to remove information from our systems at the first reasonable opportunity to reduce legal liability in the occurrence of litigation. This must be balanced against the potential business value of keeping data for longer, for example how might the long tail of data help identify useful business trends?
And of course, there will always be those stakeholders who demand to keep everything forever, just in case ….
So, we have an ever-expanding datasphere that is exponentially growing, and generally speaking, outstripping our ability to delete, or remove data from our local datasphere. This leads us to ask, do we have a good understanding of how much stuff we have, how old it is, where it lives, who can access it?
Related Article: Why Shadow IT Is Your Best Friend in the Workplace
Why Does Knowing Your Data Matter?
In my experience, everyone — the business consultants, information governance professionals, IT project managers, et al — is clear on the need to start a project or initiative with a current state assessment. If you understand your current state, you are better able to carry out a gap analysis to help you devise strategies and plans that enable you to achieve your vision. Depending on what you are doing, a complete and accurate inventory of your datasphere is a potentially major element of your current state assessment. However, the devil is always in the details.
So you get some software that can connect to all your different repositories, and you build your data and information inventory — what might it tell you?
Learning Opportunities
Perhaps that you have read only file shares that have not been accessed for years? Or files that are open to everyone, and people are still using them even when policy says not to? Perhaps it will point out all those Microsoft Teams sites you stood up as part of your pandemic remote working initiative now have a couple of terabytes of documents uploaded to their underlying SharePoint libraries, and you are unsure what all that information is?
You may discover all sorts of valuable insights from a complete inventory, and what you do with them, and how you use them to inform your decision making will depend on what your organizations or business unit’s priorities are. For example, for a business leader sponsoring a project to automate a high value business process, there might now be concerns that can be summarized as “garbage in, garbage out” because when an inventory dashboard shows that unofficial shares or SharePoint sites are the most often used systems, what is the confidence level that the proper data is being used to feed the business process?
The Information Security team might be very interested in legacy repositories that might not be as well protected as they should be, with a better understanding of what information is still residing on them, they can make a fully informed decision to delete or to migrate to a more secure and up to date system; and of course our Information Governance colleagues might be very interested in what can reviewed for defensible disposition, or migrated to the official system of record.
Related Article: Disorganized Data Can Hold You Back - Here's How to Fix It
Find It, Measure It, Manage It
A quote management guru Peter Drucker may or may not have said, but often gets credit for is: “if you cannot measure it, you cannot manage it.” I like to extend it: if you cannot find it, you cannot measure or monitor it, and if you cannot measure it, you cannot manage or control it. The “it” in question might be data, information or knowledge — whatever your organization or business unit uses or refers to. This applies to our creation of an inventory as part of a current state assessment.
If you don’t know exactly how much stuff you are sitting on, how can you make an informed decision as to whether there is risk to your organization, or the potential to enable transformation or to create business value? The outcomes don’t have to be negative, but if you don’t find it and measure, you will never know!
Learn how you can join our contributor community.
About the Author
Jed Cawthorne is principal evangelist at Shinydocs, focusing on spreading the message of the benefits of good data and information management. Jed has over 20 years experience in information and knowledge management, and over 25 years in IT.
Connect with Jed Cawthorne: