Data Mesh or Data Fabric as a Foundation for Data Management Strategy
To manage increasingly high volumes of data entering their organizations, IT leaders are turning to two new technologies: data fabric and data mesh.
The two are similar in their attempts to unify enterprise data and improve access, but there are important differences. Understanding how each is governed and for what purpose is key to determining which is best for your data management strategy.
What Is Data Fabric?
A data fabric integrates data across enterprise infrastructures, including all cloud, hybrid and on-premises deployments. It connects and leverages distributed network environments and puts data into a common format. Data fabric is a good starting place for those seeking a way to integrate data into a common format and leverage it from there. It’s also a way to add machine learning and AI functionality to a distributed system without ripping and replacing much of the system.
According to Dan Kirsch, managing director at Boca Raton, Fla.-based Techstrong Research, the value of a data fabric approach is that unlike traditional data management architectures like a data warehouse or data lake, the technology eliminates the need for bulk transfers of data.
Instead, the technology abstracts data from the location where it is created or stored and uses metadata to connect disparate data within the same environment. Data fabric aims to:
A data fabric approach also automatically monitors and adapts the layout of the data architecture to support business units and their access patterns, consolidate data, and resize certain systems. In Kirsch’s view, there are three primary benefits to this approach:
- Simplification of data management.
- Increased security and compliance because data is never moved.
- More trusted data insights because the underlying data is always up to date and complete.
Related Article: Data Fabric Might Be the Answer to Data Management Struggles
What Is Data Mesh?
Data mesh is a modern approach designed to enable scaling by focusing on decentralized, functional teams. Pods of experts in certain business domains are in charge of specific data sets, governance, management and processes. They host and serve the data at their discretion.
With a data mesh architecture, data is seen as a product, meaning that its storage, the pipelines that access it, the metadata attached to it, its trustworthiness, quality, reliability, security, governance and service level agreements are all seen as components of its value. Data, then, is more than a piece of information. Its value as defined by the domain team that manages it comes with a range of other attributes.
Data mesh is a practical model for modernizing data architecture that, in principle, cuts through the fog of data hype and gives executives a rational starting point for how to empower teams with data and scale data organizations. Like data fabrics, the goal of data meshes is to decrease the pitfalls of centralized data repositories — but it goes about it in a different way. A data mesh gives discrete teams control over how their data is stored and managed. Then, to enable centralized analytics, reporting, BI and ML efforts, APIs are used to connect to different data sources.
Decentralized or Centralized Data?
Perhaps the most significant difference is that a data fabric effort is administered and managed by a centralized body, while the data mesh supports a distributed — even decentralized — architecture in which functional data teams within decentralized domains host and serve data as a product.
More specifically, data fabric is an architecture that enables end-to-end integration of data pipelines and cloud environments using automated systems. It is intended to make it easier for users to access and share information in a distributed data environment. Data mesh, on the other hand, is a "decentralized organizational and architectural approach for efficient data management and consumption at scale," said Ben Hudson, product manager at Santa Clara, Calif.-based Dremio. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse.
In a data mesh, each business unit is responsible for creating and maintaining its own data and sharing it with others — and treating it like a product.
This decentralized approach aims to overcome the pitfalls of centralized approaches to data management, continued Hudson, where a centralized data team in the IT department is responsible for preparing and managing all the data in the company, often resulting in a bottleneck that inhibits agility and leads to friction between business units and IT.
Learning Opportunities
“Simply put, a data fabric is a data integration solution, while data mesh is an approach to data management, not a specific technology architecture or platform,” said Hudson.
Related Article: Is a Single Source of Data the Way Forward for Data GovernanceThe Security Challenges of Both Data Approaches
In both data fabric and data mesh approaches, data privacy and security should be top of mind. Unfortunately, they are almost always an afterthought, said Darshan Joshi, chief technology officer at Boston-based CYTRIO.
Data mesh holds onto data ownership a little longer than data fabric and theoretically solves the data governance problem to a certain extent, since data does not move to a data warehouse or data lake. But neither approach addresses someone who, for instance, creates a PDF report using a physical or virtual data warehouse. That report may contain sensitive data, for which role-based access control (RBAC) mechanisms would prevent unauthorized access. But once that data is in the PDF report, it can be shared freely with anyone.
It also doesn't address someone exporting data from a source of record system using valid APIs. For example, a salesperson may export customer lists that include confidential information into an Excel spreadsheet. That spreadsheet can be shared with anyone without any regard for RBAC or governance.
“In summary, both data fabric and data mesh architectures try to simplify data architecture, but they do not address data privacy or data security concerns — especially when data is out of source of record systems,” Joshi said.
Related Article: Enterprise Data Security Still Has a Long Way to Go
Making the Right Choice
Hudson doesn't see the choice between data mesh or data fabric as a simple one or the other decision. In the end, he said it "depends on what the organization is trying to achieve. If an organization has many business units that benefit from sharing or combining data, or if it is often bottlenecked by data engineering when trying to do so, then it might make sense to investigate a data mesh approach to better structure the organization.
If an organization wants to understand what data exists across the company, how datasets across business units are related and how to best optimize the architecture to support data access patterns, then it might make sense to investigate a data fabric solution."
Data fabrics and data mesh are high-level concepts, but they are not mutually exclusive. The two can be used within the same organization to support the data needs of diverse business units.
"It is often not about whether you need a data fabric or a data mesh, instead you can create a data pipeline to, in near real time, move data from existing systems to a cloud-based database that is designed for analytics," Kirsch said.
About the Author
Mike Prokopeak is editor in chief at Reworked, the premier publication covering the r/evolution of work, where he leads content development focused on the transformation of the workplace.