Originally published on Logistics Viewpoints.
In a prior post, I wrote about the various ways data is transforming global supply chains. Data is the raw fuel of digital transformation and the linchpin to accelerating industry collaboration, automation, predictive insights and so many more cutting-edge capabilities (including those yet to be invented). Today, I’m going to add a qualifier — it has to be quality data.
It’s a critical point, because there are so many ways that data can, well, “go bad”. The analogies to food and cooking are unavoidable, but they’re apt. Just as any chef will tell you that anything less than the best quality ingredients can render a dish inedible, so too, bad data can turn any data-driven IT effort sour.
So, what is quality data? What makes good data go bad? And what can you do to stop it before it’s too late?
I like the simplicity and clarity of Informatica’s definition of data quality:
Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses.
Those “other uses” are many, and growing every day. For my company FourKites, quality data is core to everything we offer customers, from providing real-time transportation visibility to a host of more advanced data science-based capabilities, such as our patented methods for predicting ETAs for that freight with incredible accuracy. FourKites’ CTO Vivek Vaid sums it up this way: We care about data quality because it leads to better data science. And better data science leads to better business outcomes for customers.
But data can and does go bad in many different ways. When you’re ingesting and synthesizing data from dozens or hundreds or millions of sources (as most sophisticated platforms do nowadays), you will see duplicated records, incomplete fields, formatting inconsistencies, different languages, different units of measurement. And plenty of human errors for those data streams where humans are still doing the inputting. The list goes on. This point is, any one of these issues can gum up the works in the applications that depend on that data, just like the proverbial “wrench in the works” that brings mechanical machinery to a clanging halt.
So, ensuring data quality is critical. But it’s also complex and requires real organizational focus. Here’s how the best companies tackle the data quality issue.
Inspect before entry. The most effective organizations have multiple layers of checks on data as it’s coming into the organization, because the first order of business is stopping bad data from coming in. For example, my company relies heavily on “lat-long” data, or data that provides latitude and longitude coordinates for freight. We have rigorous checks in place to ensure that data comes in a specific, numeric form that our systems need. Data should also be inspected at a unit level, as well as at an aggregate level, as the latter can illuminate bigger-picture trends.
Alerts. The next step is to inform someone. Who needs to know about an issue (or a potential issue)? If the problem is at a load level, e.g., missing appointment times, the action required and the urgency will be very different than an ELD data provider becoming unavailable. In addition to looking at tactical problems, you also need to watch for data drift that happens on a longer horizon and isn’t quite obvious. Use data science to help with pattern recognition. Scale up inspection and empower your teams to work on the most critical issues.
Remediation. The next step is remediation, or, What do I do now that I know it’s bad? At the highest level, organizations first need to determine whether they can resolve a given data quality issue, or whether they need to go back to external providers who were the source of the data. You also need to automate any fixes you can so it doesn’t require human intervention. The most effective organizations conduct data maturity modeling and implement best practices around master data management and data federation. Master data management helps ensure uniformity, accuracy and consistency of data, while data federation optimizes it for analysis.
Of course, there is much more that goes into effectively implementing each of these safeguards. But given the indispensability of quality data to digital transformation efforts, every organization would benefit from a thorough assessment of its data quality initiatives, protocols and investments.