What is a data fabric?
A data fabric is a unified data integration and management layer that serves as the connective tissue between cloud and on-premises data endpoints. Its purpose is to eliminate standalone data silos by bringing all data together and enabling consistent distributed access, plus a full range of discovery, integration, orchestration, and governance capabilities, for its users.
Data fabrics are especially important now that analytics ecosystems are distributed almost by definition. This is in large part because of cloud trends. Organizations are spreading their data across locations as diverse as data warehouse-powered business intelligence platforms and Hadoop-based data lakes. A data fabric serves as the thread running through them all, connecting data sources to data consumers with consistent, reliable, and flexible querying.
Data fabric is also highly automated and it can stitch together existing data integration and delivery styles, like bulk/batch and data virtualization approaches. These can be orchestrated, i.e., automatically configured, managed, and coordinated. Gartner, which pioneered the data fabric concept, holds that modern data fabrics must also incorporate recent technologies and practices like:
- Embedded artificial intelligence (AI) and machine learning (ML), for purposes including the activation and active management of metadata.
- Semantic knowledge graphs that make it easier to materialize new nodes and support use cases like natural language processing.
- DataOps, the agile-inspired methodology for shortening analytics cycles through practices like automated testing, monitoring, and statistical process control.
There is no all-in-one data fabric software capable of weaving a complete data fabric. Instead, each enterprise will need to combine built and bought infrastructure to create a data fabric that meets its specific needs.
The 5 essential capabilities of a data fabric
When sewing together a data fabric, enterprises must ensure it has the following capabilities:
1. Consistent querying from anywhere
A data fabric should ultimately abstract away the underlying complexity of the heterogeneous systems that it interconnects, so that end users like data scientists can initiate their queries from anywhere. Such anytime/anywhere convenience is the fundamental benefit of a data fabric. Users might start from a Hadoop data lake or a data warehouse, but no matter the situation, they should be able to use bidirectional data access and high-quality connectors that operate in parallel to complete the queries they need.
2. Continuous data discovery, integration, and cataloging
The inherent automation of a data fabric architecture enables it to actively find data from any source, and then integrate those sources into a knowledge graph that exposes key relationships. The data catalog is also an important component of a data fabric, as it combines the metadata and search tools that help users retrieve what they’re looking for, whether it resides in a data lake, data warehouse, or other design pattern.
3. Democratized self-service
Like the cloud infrastructure itself, a data fabric architecture is meant to streamline access for its users, via secure self-service interfaces. In a recent report on data fabric use among enterprises, Forrester has highlighted two important enablers of this self-service:
- AI and ML that automate the functions mentioned above, from discovery and classification all the way through ingestion and transformation.
- Zero-code and low-code deployment options that allow the easy deployment of even a highly complex data fabric architecture.
Overall, self-service lets business users gain control over their data preparation workflows. They can operate within sandbox environments, accessing data from any source and using their preferred tools to manipulate it and possibly even send it to production.
4. Passive-to-active metadata conversion
Gartner has highlighted this capability as foundational to the data fabric concept.
Passive metadata is static. It is usually created at design time and maintained as documentation of items like data schema and business definitions. Active metadata is dynamic and provides changing insights into parameters like frequency of access and data quality.
Within a data fabric, AI and ML convert passive metadata to active metadata by continuously analyzing metadata and then building a graph model that's easy for users to understand. These AI and ML algorithms then use the results of this analysis to optimize how they automatically manage data across the enterprise ecosystem. In this way, active metadata helps to reduce the need for manual activities when preparing and exploring data.
5. Scalability and flexibility
Like literal fabric, a data fabric should be flexible enough to accommodate change; it should never be a barrier to data access. Important functionalities for accomplishing this goal include:
- Parallel and cluster-aware data transfer
- Automatic data format conversion and type management
- Ability to use platform-specific/platform-native features
- Policy-based security that stretches across platforms
- Logging and monitoring of local and remote systems
- Pushdown processing for optimal workload performance
What are the biggest benefits of a data fabric?
A main benefit of a data fabric is increased ease of use, through consistent distributed access to data. This can be broken into three sub-benefits:
1. Accelerated data delivery, without compromising quality
Data fabric technology shortens the time from data discovery and data ingestion to delivery and consumption. Moreover, data quality is continuously refined through AI and ML algorithms that use active metadata to integrate and manage enterprise data.
2. Self-service consumption and collaboration
Using a data fabric solution, both business and technical users can quickly and consistently find what they’re seeking. This is vital at a time when organizations are dealing with a proliferating number of data sources and silos, plus big data use cases. Data fabric weaves them all together, with an easy-to-follow thread.
3. Automated integration, management, and data governance
Because data fabric architectures are highly automated, they can perform tasks that once required significant manual effort, such as integrating data sources and analyzing the quality of their data. Automation saves time and also reduces the risks of errors and compliance issues.
Embrace the possibilities of the data fabric
Teradata QueryGrid, our high-speed, parallel data fabric system, delivers the sort of scalability, flexibility, integration, comprehensive management and thorough governance enterprises need for their data. Users can access and work with data using their chosen tools, across a multi-cloud, hybrid cloud, or on-premises environment.
To learn more about how you can start weaving a data fabric, check out our webinar below on how QueryGrid works in tandem with Teradata Vantage and Starburst Enterprise Presto to modernize analytics environments and accelerate insights.
Watch our data fabric webinar