Predicting churn in the banking industry with machine learning
Churn is a critical issue for banks. Because of their high-fixed cost structure, even a tenth of a percent reduction in churn can have a massive impact on a bank’s bottom line. What if there was a way to predict that a customer is likely to churn, and then take action to prevent him or her from closing their account?
From working with a large bank (5,000 branches) in one the world’s fastest growing markets, and gathering over 24 million of their customer data points, we learnt that machine learning and Big Data can do precisely this. In fact, machine learning is perfectly suited to predicting churn because of its very binary nature (e.g. customers either churn or don’t churn). In addition to this, banking data is unique in that it encompasses both static and temporal data for each customer. This makes it relatively easy to predict present (and future) state based on historical data.
All data was stored by the bank in a single database but with separate tables corresponding to different types of data such as demographics, customer savings, customer complaint history, etc... This information can be useful if a certain cohort of customers were leaving in higher numbers than other cohorts (e.g. someone with a complaint may be more likely to churn).
In addition to this static data, the bank also stored dynamic data such as transaction history which would include variables such as date, time and amount of the last transaction.
The vastness of the dataset and time constraints meant that we had to use an out-of-the-box big data machine learning solution in the form of a ‘random forest’. Random forests let us see which variables (in the past) contribute the most to present state (or the accuracy of our prediction). This allows us to ensure that the variables in our model are relevant, and lets us reduce less-relevant features in our machine learning training dataset.
Below is an example of the top five most important features of our model.
We see that days since the last transaction is by far the biggest predictor of whether or not someone will churn. Average transaction amount in the most recent month also played a significant role in the prediction since someone with a high monetary transaction amount probably is not going to stop using the service.
We called ‘soft churners’ the people who had made zero transactions in the past 90 days. Our model showed that these ‘soft churners’ had the highest probability of becoming actual churners after the first three-month observation period. Based on transaction history, our model was in fact able to predict churn with 80% accuracy*. It is important to note that we now have a data-backed conclusion that reinforces what branch managers and marketing directors intuitively know, from anecdotal evidence: a customer who doesn’t use a service is much more likely to abandon it.
The fact that we can prove this correlation with a dataset in the tens of millions will give bank executives a fact-basis for action in the second step: implementing actual programmes to retain their customers, especially the most valuable ones. We are now working with the bank to launch a customer retention programme targeted at higher value customers: through push notification and SMS, the bank will build new engagement channels with its at-risk customers, cutting churn by 5% in the coming quarters.