Predictive intelligence is a big deal these days. FourKites recently released our Dynamic ETA for LTL solution, which will continue to improve in accuracy and provide valuable operational insights to our customers as our network continues to expand. In talking to one of our customers, it occurred to me that despite the advances FourKites has demonstrated in machine learning and data science, the concept is still a black box to many of our stakeholders. We thought we would take this opportunity to start unpacking some of that and communicating how we are able to do the things we do.
Let’s start with the basic question: What exactly is predictive intelligence? And how does that differ from traditional business intelligence?
We’ll start by providing a quick summary of the data science landscape and how predictive analytics works. We will then go into the metacognitive aspect of predicting predictability and explain the value of being able to do so.
AI is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals who have a cognitive ability to get better through experience. What this means in the world of FourKites’ real-time supply chain visibility is to learn that mechanical breakdowns or busy rest areas will result in longer transit times. This may seem obvious, but when you are looking across 100 countries, the number of these learnings quickly outgrows what a human can keep track of, e.g. idiosyncrasies of hours of service (HOS) regulations, commercial traffic hours, warehouse congestion, individual driver, route or carrier learnings, and the list goes on.
In the landscape of data science applications, predictive analytics is a special class of algorithms that tell you what will happen vs. other techniques that may focus on explaining why something that happened in the past occurred.
Predictive analytics at FourKites is primarily about solving customer issues in anticipation of a problem. For instance, a basic use case might be that unusual traffic causes a change in the ETA (estimated time of arrival) which could cause a shipment to miss its anticipated delivery window. You might be thinking, “That’s not data science!” – and you would be right if the mechanism for doing this work was simply rule-based.
IF Delay > 1 hour and Change in Appt Time > Window Allowed
THEN mark the shipment as delayed
However, the world is not as simple as that, and that’s where data science comes into the picture. Data science is about making use of correlative factors to predict an outcome vs. simple documentation of human knowledge. For example, a model may take into account other factors, such as whether a load is a relay load, or in making a statistical assertion that the truck (or train, ship or plane) will be delayed.
So how does this all line up to predicting predictability?
As we all know, cooking a good meal requires a good cook and the right ingredients for the meal. The cook here is the Data Science team, and the right ingredients are the data. There are two ways we look at it: Having the right data and not understanding it vs. having the wrong data and understanding it well.
As you can tell, a good cook will almost certainly know whether or not the meal will turn out well given the right/wrong ingredients. Similarly, data scientists can “predict” the predictive power of a given model by looking at the availability and quality of data. This is what’s referred to as predicting predictability.
Top-5 ways to predict predictability
Or to put it another way, how do you know whether your predicted ETAs or other data science-driven predictions are going to be reliable?
Data science is research. Data scientists are constantly experimenting with tweaking their algorithms to yield better outcomes. In fact, there are competitions to gauge the effectiveness of various ways of accomplishing a goal. Having a large, dedicated and competent Data Science team is the most effective way to achieve success in this area. Small, outsourced topical solutions will be less effective. Commitment to data science must be total.
2. Data Quality and Frequency
This is the next most important factor. Garbage in = garbage out. Take, for example, an algorithm that uses weather to predict traffic delays. If that model gets poor-to-no weather data, its likelihood of success is pretty low. FourKites spend a tremendous amount of time working with the data to get the inputs right so that the outputs are reliable. The frequency of information can also be a factor. If your carrier or ELD provider is providing pings every four hours instead of every few minutes, the model will run less frequently and you will have less data to work with. As you think about your outcomes, evaluate whether your inputs are good. Is your TMS, carrier, broker or ELD provider sharing the data you need? Do you know what the gaps are? Is your platform providing actionable visibility to those gaps?
3. Synthetic Data, Imputed Data
In some cases, getting a set of data is just not possible. Let’s say a piece of equipment doesn’t have a temperature sensor – well, then, you can’t get the temperature. However, there may be ways to impute or substitute it with something close. Perhaps there is a vibration sensor that can be used to judge the temperature? We’ve applied this principle in our network when we encounter trucks without ELD devices so you can “see” a truck even though it’s not actually emitting any location data.
4. Data Federation
The next ingredient is data federation. Data federation is about stitching together data across enterprise boundaries to make sense of the information.
The size and shape of available data define how well you can train a data science model and how much precision you can get. For instance, if you are looking to make a prediction on a route from Oklahoma to Chicago, but you have data from only a handful of routes, the prediction will not be very informed. However, if you leverage the data coming out of a platform that processes millions of loads, you have a much better chance of getting the right data to enable that prediction.
My favorite example is that of Google Autocomplete. The reason you get magical results when typing in Google is that the platform is harnessing the power of billions of searches to bring you the most relevant suggestions. Another way we think about this is by painting a complete picture by using different sources. By stitching together the story across carrier data, shipper data and data in the world, we can provide a more comprehensive solution. You can see that in our live Network Congestion Map, where we employ a host of data sources to bring you things like port congestion, city and state trends, border crossing delays, and more.
Finally, it’s critical to make platform investments to make the entire lifecycle of data manageable at scale, and automated to reduce friction. If your platform represents a Rube Goldberg machine, everything will seem slower and less on the mark. FourKites has made a significant investment in creating a scalable platform that can process tens of thousands of data providers and trillions of data points to drive outcomes from all the inputs.
. . . . . . . . . . . . . . . . . . .
In summary, these 5 ingredients are so important because when you get down to it, data science is still a science, and you can influence the nature of the outcomes by making sure you have the right ingredients. Creating good outcomes comes from a partnership and a deep commitment to working together. Stay tuned as we continue to explore the critical role of AI and data science at FourKites, and in the logistics industry at large.