This is the third blog post in a series covering churn and lifetime customer value.
This is the third blog post in a series covering churn and lifetime customer value. There are many ways to predict churn rate on the individual customer level. In this article, we will take a closer look at three of them:
Let's try out these three models on the open dataset from Kaggle’s WSDM - KKBox's Churn Prediction Challenge. The full code is available in the Jupyter notebook. In this dataset, we have users of the KKBOX music streaming service along with their attributes, transaction histories and churn label (whether a customer will churn out in the next 30 days). Due to the nature of the business, customers can put subscriptions on pause or change subscription intervals, which makes this dataset both contractual and non-contractual simultaneously. This means that a customer who puts a subscription on pause doesn't necessarily churn out, unlike the situation in telecom.
The dataset contains three tables of interest to us:
Firstly, let's attempt to estimate churn probability purely based on recency (how many days since we last saw a customer). To do this, we will choose a particular date (e.g. 12/31/2016) and all the customers who purchased on this date (cohort). Then for each following day, we will plot the proportion of churned customers out of this cohort of customers who haven’t made a purchase yet. Such a plot will show us an error of churn definition based on how many days we didn't see the customer. For example, what would be our average false positive rate if we define churn as 45 days of inactivity?
For this dataset, it looks like 30 days is a perfect value to mark such a customer as churned. In this particular case, it is due to the nature of the dataset (we have active customers who all made purchases in March 2017, instead of customers who made purchases at any time). At any rate, this type of graph could be a useful tool to explore your customers.
Now, let's get into actual churn modeling with our first type of model: Pareto/NBD.
We will use a great library with the implementation of such models called lifetimes.
To fit such a statistical model, we only need three features:
Firstly, let's take a look at the capacity of statistical models to provide exploratory insights.
Once we have fitted the model, we can have the estimation of a number of future purchases for the next 1 unit of time (in our case, it's one month). We can see that the most valuable customers are the oldest ones (high recenсy, using the lifetimes package’s definition) who make a lot of purchases (high frequency). This is a somewhat obvious result, but what’s valuable is that now that relationship is quantified. For example, we can see that in the early stages, one purchase per month does not guarantee that customers will continue to buy. In the later stages (customers for 1-2 years), they are still valuable customers even if they buy once every two months.
Now let's calculate four metrics to evaluate the model's prediction performance:
This time, for prediction purposes, we will use the Pareto/NBD model. It is more accurate but takes more time to train.
The results are quite good, especially considering the fact that we only used three features. The ROC AUC is 0.77, which is definitely not random. But as we can see, the Log Loss is pretty high, which means that this model has difficulties with churn probability estimation. Such estimations are crucial for LTV prediction.
From the calibration curve, we can see that the model assigns low probabilities. For example, customers with an actual churn probability of 0.6 have a 0.2 prediction probability on average.
Our second approach is to use survival analysis, which is good for contractual businesses.
There is another Python library by the same author called LifeLines.
Again, let's see how such an analysis can help us during the exploration phase. We can plot Kaplan-Meier survival curves to see how survival (1 - churn) probability depends on customer tenure.
Such curves give us an ability to compare different groups of customers and thus evaluate whether some feature affects churn probability or not.
For example, we can see that auto_renew is dramatically reducing churn chances.
Also, we can see that there are three groups of payment methods that significantly affect churn probability. For the business, this suggests that there is considerable value in convincing customers to use one payment method over another.
Now let's train our survival model using the payment method and tenure features.
The ROC AUC is relatively similar to that of the Pareto/NBD model (0.76), but the Log Loss is much better. Still, 0.31 is too high. Generally, it's a good model, but again, it is prone to similar challenges when evaluating actual churn probabilities.
Now let's beat our problem with a multi-purpose hammer: machine learning.
Let's use the Random Forest method with the same features we used in the survival model.
Even with the same features, it shows much better performance: The ROC AUC is 0.80, while the Log Loss is 0.20. Finally, we have a model with better churn probability estimation than random guessing.
Additionally, we can easily see the relative importance of features, which can help to confidently drive business decisions.
But what if we use more features, such as a number of orders, recency and is_auto_renew?
The result is even better. The ROC AUC becomes 0.85, while the Log Loss becomes 0.17. At this point, we already have a model that may be good enough to implement, and it should help us to predict customer churn rates much better than some rule-based models.
In this article, we compared different approaches to churn modeling, describing their pros and cons. Statistical models and survival analysis are valuable for data exploration and better understanding your customers. Meanwhile, machine learning models are often better in terms of flexibility, accuracy metrics and churn probability estimation.
With that said, churn prediction is always about how well you understand your customers, so it’s useful to analyze them from different angles. Sometimes, the simple model is good enough. Other times, you may need some advanced tricks to reach the desired accuracy.
I’d like to stress that the material covered in this article is just an introduction to churn prediction, and individual needs will vary by use case. Through years of experience, we at Plytrix have developed considerable expertise both in creating effective churn models and enabling businesses to execute churn reduction strategies using those models.
No matter what your case is, we are happy to discuss your needs and assist in those efforts. You can book a free discovery call through this link.
If you enjoyed this post, subscribe to our publication or sign up for our newsletter.