We get to go to a lot of conferences. And we’re always amazed at how many vendors and commentators stand up at events and trade shows and say things like, “The objective of analytics is to discover new insight about the business”.
Let us be very clear. If the only thing that your analytic project delivers is insight, it has almost certainly failed. Your objective must not be merely to discover something that you didn’t know, or to quantify something that you thought you did — rather it must be to use that insight to change the way you do business. If your model never leaves the lab, there can never be any return on your investment in data and analytics.
“Analytics must aim to deliver insight to change the way you do business”
The goal of machine learning is often — though not always — to train a model on historical, labelled data (i.e., data for which the outcome is known) in order to predict the value of some quantity on the basis of a new data item for which the target value or classification is unknown. We might, for example, want to predict the lifetime value of customer XYZ, or to predict whether a transaction is fraudulent or not.
Before we can use an analytic model to change the way we do business, it has to pass two tests. Firstly, it must be sufficiently accurate. Secondly, we must be able to deploy it so that it can make recommendations and predictions on the basis of data that are available to us — and sufficiently quickly that we are able to do something about them.
Some obvious questions arise from all of this. How do we know if our model is “good enough” to base business decisions on? And since we could create many different models of the same reality — of arbitrary complexity — how do we know when to stop our modelling efforts? When do we have the most bang we are ever going to get, so that we should stop throwing more bucks at our model?
So far, so abstract. Let’s try and make this discussion a bit more concrete by looking at some accuracy metrics for a real-world model that one of us actually developed for a customer.
A working example of machine learning
The business objective in this particular case was to avoid delays and cancellations of rail services by predicting train failures up to 36 hours before they occurred. To do this, we trained a machine learning model on the millions of data points generated by the thousands of sensors that instrument the trains to identify the characteristic signatures that had preceded historical failure events.
We built our model using a training data set of historical observations — sensor data from trains that we labelled with outcomes extracted from engineers’ reports and operations logs. For the historical data, we know whether the train failed — or whether it did not.
In fact, we didn’t use all of our labelled historical data to train our model. Rather, we reserved some of that data and ring-fenced it in a so-called “holdout” data set. That means that we have a set of data unseen to the model that we can use to test the accuracy of our predictions and to make sure that our model does not “over-fit” the data.
If the only thing that your analytic project delivers is insight, it has almost certainly failed. Don't just discover something that you didn’t know, or quantify something that you thought you did — use that insight to change the way you do business.
The table shown above is a “confusion matrix” resulting from the application of the model built from the training data set to the holdout data set. It enables us to understand what we predicted would happen versus what actually did happen.
You can see that our model is 84 percent accurate in predicting failures — that is, we correctly predicted that a failure would occur where one subsequently did occur within the next 36 hours in 443 out of 525 (82+443) cases. That’s a pretty good accuracy rate for this sort of model — and certainly accurate enough for the model to be useful for our customer.
Just as important as the overall accuracy, however, are the number of so-called type-one errors (false positives) and type-two errors (false negatives). In our case, we incorrectly predict 54 failures where none occur. These errors represent 54 situations where we might potentially have withdrawn a train from service for maintenance it did not need. Equally, there are 82 type-two errors. That means that for every 14,014 (13,435+54+82+443) trips made by our trains, we should anticipate that they will unexpectedly fail on 82 occasions, or 0.6 percent of the time.