September 23, 1999 is a day that will live in data accuracy infamy. It was on that date that the $125M NASA Mars Climate Orbiter lost communication with mission control as it approached its operating orbit around the red planet. Engineers quickly surmised that the spacecraft burned up in the Martian atmosphere because it descended too close to the planet’s surface.
An investigation revealed that two separate software applications controlling the spacecraft’s thrusters miscommunicated on the amount of force needed to reach the right altitude. One piece of software used the imperial system, calculating the force needed in pounds, while the other piece of software assumed the data it was taking in was in the metric unit, newtons.
Bad data doomed the mission.
Thankfully, no one was harmed because of this mistake. However, it does highlight the impact that seemingly small data accuracy issues can have. IBM recently estimated that data quality issues cost $3.1 trillion dollars per year. Because knowledge workers must process so much data in the course of their work, they often spend a lot of time accommodating and fixing inaccurate data. And because this work is often done under deadline, these workers rarely follow up with data owners to make permanent fixes to the data, meaning the errors will usually crop up again and again.
To properly address the challenges posed by poor data accuracy, companies need a deeper understanding of what data accuracy is and the impact it can have on their business.
What is Data Accuracy?
Defining data accuracy starts with understanding what the terms “data” and “accuracy” mean. Data is “a collection of facts (numbers, words, measurements, observations, etc) that has been translated into a form that computers can process.” This definition is most relevant to records of historical events stored on digital media that computers can access and business users can harness for business advantage.
For data to be “accurate,” on the other hand, it must meet two criteria: form and content. In his book Data Quality: The Accuracy Dimension, author Jack Olson writes, “Form is important because it eliminates ambiguities about the content.” In other words, the content correctly captures the historical event.
For example, consider how dates stored in different formats could be problematic. “November 19, 2018” could be stored in the common U.S. format of “11/18/2018.” However, other countries around the world have different conventions for writing dates, such as “18/11/2018.” If someone in the U.S. tried to use the data in that format, it could definitely be problematic. As Olson says, “A value is not accurate if the user cannot tell what it is.”
The second criteria is that the data content must be consistent. Consider the example of the medical specialty “Obstetrics-Gynecology,” which could by recorded as “OB-GYN,” “OBGYN,” “OB GYNE,” or a number of other ways. City names can also be represented with correct but inconsistent values. For instance, “New York City” could be captured as “NYC,” “New York,” or “NY NY.” When values for the same data point are inconsistent, it prevents analysts from grouping and summarizing data. But because so much data analysis involves aggregation, consistent values are needed to enable accurate data use.
In a nutshell, “data accuracy” could be summarized as correct and consistent information stored in digital assets meant for business use. With this understanding in mind, let’s dig deeper into the importance of data accuracy.
Data Accuracy Sets the Stage for Good Business Decisions
As the definition of data suggests, it is a representation of reality. For example, a football scoreboard shows the points for each team. These numbers are just a representation of the actual touchdowns, extra points, field goals and safeties scored by the players on the field.
Data also needs to be a correct representation of reality. The old axiom of “garbage in, garbage out” illustrates that reliable and accurate data is an absolute prerequisite for business analyses based on that data.
There are many business benefits of this approach. Businesses can:
- Increase revenue. Reliable and cleansed data supports effective decisions that help drive sales.
- Save money. Up-to-date and accurate data can help prevent wasting money on ineffective tactics, such as sending mailers to non-existent addresses.
- Improve customer satisfaction. Accurate and current data about your customers will help your marketers deliver the right messages at the right time and in the right place to move potential buyers to the next step in their customer journey.
- Save time. Properly governed data should require less time and money to remediate.
- Improve ROI. The aforementioned reduction in data remediation costs will result in a greater return on investment on data assets.
As the list above shows, data accuracy is a foundational building block for many common business analytics. However, there is a cost for providing reliable and timely data for these analyses: a rigorous, persistent and executive-sponsored culture of data governance. This effort will require dedicated resources at all levels of the organization to help ensure data accuracy, among other dimensions of data quality. But it’s worth the investment to not only enable reliable business analyses, but also to increase stakeholder acceptance of the data.
Data Accuracy Builds Trust in the Data
Stakeholder acceptance and use of analytics depends on their trust in the data. End users and executives don’t always think about the behind-the-scenes work and preparation that went into a dashboard. But they do have an intuitive grasp of and even an emotional connection to accurate data, making obviously inaccurate data a real challenge to overcome.
According to a recent report by KPMG, only 45% of 2,165 data and analytics decision-makers “consistently use rigorous quality checks to ensure the accuracy of data and analytics models and outputs.” The same study stated that:
- 60% of organizations are not confident in their data and analytics
- 16% believe they perform well in ensuring the accuracy of analytical models
- 10% believe they excel in managing the quality of their data and analytics
Clearly, there is an opportunity to bridge this trust gap, which will bring about greater acceptance and use of data as an asset that can generate business value.
As with ensuring good business decisions, building trust in the data requires a purposeful and persistent culture of data governance that actively cultivates data accuracy. Aspects of a trusted data governance approach include:
- Quality. Are the data building blocks reliable enough to build trusted analytics?
- Effectiveness. Do the analytics work as intended?
- Integrity. Are analytics being used in an acceptable way?
- Resilience. Are long-term operations optimized?
As Sanjay Krishnamurthi, Chief Architect at Microsoft, notes, “Unless you put the governance in the processes that brought the data into the data warehouse, which is downstream from an analytical tool, there was no guarantee what you were seeing was correct.”
Data Accuracy is a Prerequisite for Artificial Intelligence
Data accuracy takes on even more importance as businesses pursue strategies using Artificial Intelligence (AI). A recent MIT/Google joint study found dramatic adoption of Machine Learning (ML) in the marketplace. According to the study:
- 60% of respondent companies have implemented ML initiatives
- 50% are using ML to better understand customers
- 48% expect ML to help make them more competitive
- 22% say the C-Suite will have primary responsibility for their company’s ML initiatives
The benefits these organizations expect to gain from AI and ML are premised on high-quality data. At the heart of AI technologies are algorithms that use data to make predictions and iteratively refine these models as more data is collected. AI models need little human intervention after being deployed, which puts a premium on accurate data. While AI continues learning on its own, it cannot tell if it is using inaccurate data. This means that the predictions made by AI models could be flawed or incomplete, which could impact customer relationships, competitiveness, and revenue growth.
Data accuracy is the hidden pillar of the digital enterprise. The industry trade press is replete with dazzling success stories of up-and-coming technologies like AI, Customer Relationship Management, Supply Chain Management, Digital Marketing, and more. But what is often left unsaid – and is poorly understood – is the criticality of feeding accurate data into these technologies, and the hard work of data governance to ensure that accuracy.
With data-driven initiatives becoming ever more prevalent and critical to a company’s success, building trust in the data requires an enterprise-wide commitment to seeing that data as an asset – and effectively managing that asset daily.