Data Analytics has the potential to richly reward your company with better decision making. If you can foresee market trends, discover hidden patterns, uncover untapped efficiencies, or make your products more appealing, you can surpass the competition.
This is the promise of data…but the current reality is that for many organizations it’s a major money pit. IBM estimates that poor data quality may cost $3.1 Trillion annually in the U.S. alone, and Gartner estimates that the average company loses $15 million per year due to these issues.
There are a number of reasons for this, but they tend to come back to one thing: the data analytics lifecycle--the time it takes to get from question to answer--is too long.
Why the data analytics lifecycle is too long
Here’s a typical scenario. Someone in the sales department of a consumer goods company poses a question to the central data team. Though it’s mission critical, it’s not a particularly difficult question, and doesn’t require sophisticated statistical or machine learning techniques. But it does require:
Hunting down the right data, and determining which datasets, out of several that appear to offer similar information, offer the most complete and accurate data
Writing a complicated SQL query that joins columns from datasets that exist in different departments to produce a table with all the variables needed to answer the question
Cleaning/prepping the data--addressing inconsistencies, missing values
Extracting, Transforming, and Loading (ETL) the data so that it can be opened in an analytics platform
Creating a report or a dashboard that explains the output of the analytics, or answer to the data-derived question
By the time it’s done, weeks, perhaps even months may have passed. The needs of the business department that asked the question have likely changed, so what you’ve given them isn’t really that helpful.
A long data analytics lifecycle costs money
The end result, you’ve spent a lot of time and resources with no ROI. Consider the quantifiable cost associated with this:
10 hours for a typical data engineer to identify the right data, obtain the permissions to share the information between departments, write the SQL query to produce the table. At approximately $80/hr (based on Builtin.com’s salary estimate), you’ve spent at least $640.
10 hours for the data scientist to prep/clean the data, port it into a new platform, and create a report. At approximately $70/hour (based on Builtin.com’s salary estimate), you’ve spent at least $700.
$800-8000 per month for the infrastructure required to ETL (Extract, Transform, Load) the data so that it can be analyzed. On a related note, if this includes the monthly cost of maintaining a data warehouse where data can be analyzed, expect this figure to be much higher.
Enterprise-grade prices for analytics and data visualization software which typically starts somewhere around $100 per user per month.
At this point you may have already spent well in excess of $1000 to answer a simple question. Assuming that your line of business folks may be asking dozens of questions a month, this cost could easily balloon to well over $100K per month.
And of course that’s assuming that the data is analyzed fast enough for the answers to be actionable. If it’s not, you should also assume that you are losing money in less quantifiable ways, such as:
The opportunity cost associated with a late answer to a question. For example, you may have missed an opportunity to hold a sale in a certain region, that would have capitalized on heightened demand: potentially $millions in missed revenue
The opportunity cost associated with the fact that your data scientist or engineer could have done something more productive with their time than trying to answer this very simple question. They might have been focused on tougher questions, or pursuing innovation for the company
And of course, there’s the salaries of the many other people that have been inconvenienced by the process of chasing down and pulling a dataset from multiple departments, including the department leaders/data owners, as well as your compliance officer
At this typical organization, the TCO for data analytics is clearly too high for it to be a profitable endeavor. But, if it could be reduced to a fraction of the cost, the tables would be turned. Here are some things to keep in mind as you look to reduce TCO for your analytics capabilities.
Reduce the number of tools and infrastructure
Whether we’re talking about cloud based apps or legacy infrastructure, a complicated data architecture can cost your company big time. A data fabric allows you to streamline the number of tools and amount of infrastructure required to manage and analyze data. It does so by creating an abstraction layer that eliminates the complexity of working with distributed data and disparate formats. It provides seamless access to enterprise data, which helps data teams keep up with the exploding volume of user requests, while making it easier to enforce compliance without blocking appropriate users from data access.
Reduce the cost associated with moving data
ETL (Extract, Transform, Load), the process of moving data so that you can analyze it, is expensive in more than one way. First, it’s a painstaking process that takes a lot of time and expertise. A survey of 502 IT professionals by IDC, for instance, found that ETL is significantly slowing down companies’ ability to get answers from data, with most data being at least five days old by the time the process is complete.
Second, it’s risky. You can lose data, or if you’re making copies, that comes with its own risk in the form of multiple versions of the truth. Furthermore, Changed Data Capture (CDC) technology, which is designed to address this problem, was also found by IDC to have a slowing effect on data analytics.
A data fabric essentially removes the need for ETL. The abstracted ‘fabric’ means that you don’t have to move data in order to access it or integrate it with the system. This means significant reductions in the cost and risk associated with moving or copying data into warehouses and other repositories.
In general, the more you can shorten the analytics lifecycle, the lower your TCO. In fact, an IDC survey found that 76 percent of respondents said that not being able to analyze current data makes it difficult to take advantage of business opportunities. This time-to-value can be shortened by:
Empowering non-technical users to locate the right dataset with search
Empowering non-technical users to assemble datasets without SQL
Removing the need for ETL--analyze the data where it resides
Analyzing the data without having to open up a new software platform