A data fabric provides a virtual layer that connects the entirety of an organization’s data, along with all of the processes and platforms connected to it. Just as the ‘Web’ refers not to a single software platform or piece of hardware but rather a layer of connectivity, so to the data ‘fabric’ refers to the connecting of many pieces of data-related software and hardware into a unified system.
Though the data fabric is a relatively new concept compared to a data lake or warehouse, it too is evolving quickly. So, what may have been considered a ‘data fabric’ a few years ago may actually be behind the curve and insufficient for your company’s needs.
The traditional approach to a Data Fabric involves stitching an assortment of data tools and sources together -- an expensive system integration project. But technology and understanding have progressed tremendously and the modern data fabric is much more streamlined, smart and elegant. Here are a few things to think about before you implement your data fabric solution.
Predictive capabilities and advanced learning
It’s a given that a data fabric leverages AI and machine learning to a great extent. However, the next-gen data fabric continually learns from existing data and new data that is added to your system. As time goes on it makes better predictions regarding potential relationships and points of integration between data owned by different departments, such as data held in a CRM system, and data held in a supply chain software system.
Data is defined over time not just based on the metadata given to it by its creator, but on what people are actually doing with it, and this informs how the system makes its recommendations over the long term. It continually tracks data use cases, considering them as contributors to the overall refinement of the understanding of the data.
Ability to visualize data relationships
A data fabric might provide access to the data, but if you can’t use it to identify relationships, it’s of limited use. For instance, the advertising department might want to understand a relationship between a product line’s sales and demand. They’ll have ready access to the data on product purchases, but be unaware that the supply chain division has data on demand for that line which could be joined to their data to answer the question. Furthermore, even if they heard that such data existed, conceptualizing the relationship would take extraordinary visio-spatial skills.
The next-gen data fabric makes these relationships accessible to both technical and non-technical users through ‘knowledge graphs’ which visually map out the data sources and their relationships. These user-friendly charts are overlaid with helpful descriptive information--like flowcharts--that allows users to easily identify data relationships that will answer their analytics questions. A user is not only able to quickly see what information is in their department, but in other departments too, and how it all connects together.
Semantic search and assembly
Without effective discovery, the data needed for a particular purpose is lost among petabytes of other data. A next-gen data fabric allows users to search for or browse through all sources to better understand whether it helps them in their analysis.
SQL--while vital--is a major obstacle to the ability to not just access the data, but also to assemble it. It doesn’t just create problems for people who can’t code. Even those with serious SQL skills may have difficulty spinning up a federated query that joins data from multiple repositories. A next-gen data fabric leverages NLP for searching and querying data, simplifying the process for effectiveness and practicality. It takes regular sentences or phrases and intelligently parses the words to derive search terms. For example, a sales manager can enter “What are product sales by month?”. The data fabric then scans its metadata to find relevant data elements, and allows the user to assemble the data without coding.
Automating Time-consuming Processes to Speed Analytics
A next-gen data fabric uses augmented data management and automated orchestration to minimize the need for human intervention. As AI and ML algorithms continue to learn from your data sources, they progress from merely identifying relationships to begin automating time-consuming processes that are otherwise performed manually. For example, it may catalog previous queries, or data analytics questions asked by users, so that those processes don’t have to be re-engineered each time the same or even a similar question is asked. This not only makes data-derived insight available to non-analysts, but it also frees up the time of analysts and data scientists to deal with more complex problems.