How-To

This is the third blog post in a series covering churn and lifetime customer value.

Kliment Merzlyakov

August 31, 2021

This is the third blog post in a series covering churn and lifetime customer value. There are many ways to predict churn rate on the individual customer level. In this article, we will take a closer look at three of them:

- Statistical models such as Pareto/NBD, BD/NBD and Gamma-gamma
- Survival analysis
- Machine learning

Let's try out these three models on the open dataset from Kaggle’s WSDM - KKBox's Churn Prediction Challenge. The full code is available in the Jupyter notebook. In this dataset, we have users of the KKBOX music streaming service along with their attributes, transaction histories and churn label (whether a customer will churn out in the next 30 days). Due to the nature of the business, customers can put subscriptions on pause or change subscription intervals, which makes this dataset both contractual and non-contractual simultaneously. This means that a customer who puts a subscription on pause doesn't necessarily churn out, unlike the situation in telecom.

The dataset contains three tables of interest to us:

- Members—information about customers, such as gender, age, etc.
- Train—churn label for each customer.
- Transactions—historical transactions with dates, subscription plans, etc.

Firstly, let's attempt to estimate churn probability purely based on recency (how many days since we last saw a customer). To do this, we will choose a particular date (e.g. 12/31/2016) and all the customers who purchased on this date (cohort). Then for each following day, we will plot the proportion of churned customers out of this cohort of customers who haven’t made a purchase yet. Such a plot will show us an error of churn definition based on how many days we didn't see the customer. For example, what would be our average false positive rate if we define churn as 45 days of inactivity?

For this dataset, it looks like 30 days is a perfect value to mark such a customer as churned. In this particular case, it is due to the nature of the dataset (we have active customers who all made purchases in March 2017, instead of customers who made purchases at any time). At any rate, this type of graph could be a useful tool to explore your customers.

Now, let's get into actual churn modeling with our first type of model: Pareto/NBD.

We will use a great library with the implementation of such models called lifetimes.

To fit such a statistical model, we only need three features:

- A number of orders.
- Tenure (the difference between the current date and first transaction date).
- Recency (the difference between the last and the first transaction dates). This is confusing because recency is commonly defined as the current date minus the last transaction date, but let's stick with the library’s terminology.

Firstly, let's take a look at the capacity of statistical models to provide exploratory insights.

Once we have fitted the model, we can have the estimation of a number of future purchases for the next 1 unit of time (in our case, it's one month). We can see that the most valuable customers are the oldest ones (high recenсy, using the lifetimes package’s definition) who make a lot of purchases (high frequency). This is a somewhat obvious result, but what’s valuable is that now that relationship is quantified. For example, we can see that in the early stages, one purchase per month does not guarantee that customers will continue to buy. In the later stages (customers for 1-2 years), they are still valuable customers even if they buy once every two months.

Now let's calculate four metrics to evaluate the model's prediction performance:

- ROC AUC—This is the most commonly used performance metric for binary classification. The value 0.5 means prediction is absolutely random. 1.0 means an ideal prediction.
- Log Loss—This metric allows us to evaluate not only the correctness of a binary prediction (churn or not) but also the proximity of probability estimation. In other words, if the actual churn probability for some customers is 0.8 and our model tells us that it's 0.3, from a ROC AUC perspective, it can be still a perfect model (it doesn't care how far we are from the actual probability until we have a good threshold value to separate classes). But from a Log Loss perspective, it is a bad model. The lower the value, the better. The ideal value is 0 without a top boundary. A good way to understand whether the value is good or not is to compare it with random guesses.
- ROC and Precision-Recall Curves—These are similar to ROC AUC but provide a visual sense of the model quality.
- Calibration Curve—This curve shows us how far our probability estimations are from actual churn probabilities.

This time, for prediction purposes, we will use the Pareto/NBD model. It is more accurate but takes more time to train.

The results are quite good, especially considering the fact that we only used three features. The ROC AUC is 0.77, which is definitely not random. But as we can see, the Log Loss is pretty high, which means that this model has difficulties with churn probability estimation. Such estimations are crucial for LTV prediction.

From the calibration curve, we can see that the model assigns low probabilities. For example, customers with an actual churn probability of 0.6 have a 0.2 prediction probability on average.

Pros:

- Statistically or scientifically justified model.
- Provides good out-of-the-box fit for your data. Accurate on average for all customers.
- Provides tools to explain your customers’ behavior. Once the parameters of the model are estimated, we can use them to evaluate the customer lifecycle metrics.
- Works well for non-contractual businesses.

Cons:

- Hard to add new features to the model. The common features are frequency, recency and monetary value, which are usually the most important factors. However, if you need to add something on top of that, you would need to add them as covariates to the model.
- While good for average estimations, these models do not show top-tier accuracy on the individual level.

Our second approach is to use survival analysis, which is good for contractual businesses.

There is another Python library by the same author called LifeLines.

Again, let's see how such an analysis can help us during the exploration phase. We can plot Kaplan-Meier survival curves to see how survival (*1 - churn*) probability depends on customer tenure.

Such curves give us an ability to compare different groups of customers and thus evaluate whether some feature affects churn probability or not.

For example, we can see that auto_renew is dramatically reducing churn chances.

Also, we can see that there are three groups of payment methods that significantly affect churn probability. For the business, this suggests that there is considerable value in convincing customers to use one payment method over another.

Now let's train our survival model using the payment method and tenure features.

The ROC AUC is relatively similar to that of the Pareto/NBD model (0.76), but the Log Loss is much better. Still, 0.31 is too high. Generally, it's a good model, but again, it is prone to similar challenges when evaluating actual churn probabilities.

Pros:

- Rather than predicting churn probability, survival analysis predicts when the customer will drop out.
- Contains handy tools such as the Kaplan-Meier survival curve, which provides a general overview of customers’ churn.
- Easy to add covariates (additional features, such as gender or any other).
- Provides a feature selection technique through the model’s coefficient confidence intervals.
- Works well for contractual businesses.

Cons:

- Does not apply well to non-contractual businesses.

Now let's beat our problem with a multi-purpose hammer: machine learning.

Let's use the Random Forest method with the same features we used in the survival model.

Even with the same features, it shows much better performance: The ROC AUC is 0.80, while the Log Loss is 0.20. Finally, we have a model with better churn probability estimation than random guessing.

Additionally, we can easily see the relative importance of features, which can help to confidently drive business decisions.

But what if we use more features, such as a number of orders, recency and is_auto_renew?

The result is even better. The ROC AUC becomes 0.85, while the Log Loss becomes 0.17. At this point, we already have a model that may be good enough to implement, and it should help us to predict customer churn rates much better than some rule-based models.

Pros:

- Easy to implement any business-specific feature.
- Applicable for both contractual and non-contractual business models
- Elaborated quality metrics
- Works well for individual-level predictions.

Cons:

- For a non-contractual setup, it requires a churn definition, which may be artificial and subjective.

In this article, we compared different approaches to churn modeling, describing their pros and cons. Statistical models and survival analysis are valuable for data exploration and better understanding your customers. Meanwhile, machine learning models are often better in terms of flexibility, accuracy metrics and churn probability estimation.

With that said, churn prediction is always about how well you understand your customers, so it’s useful to analyze them from different angles. Sometimes, the simple model is good enough. Other times, you may need some advanced tricks to reach the desired accuracy.

I’d like to stress that the material covered in this article is just an introduction to churn prediction, and individual needs will vary by use case. Through years of experience, we at Plytrix have developed considerable expertise both in creating effective churn models and enabling businesses to execute churn reduction strategies using those models.

No matter what your case is, we are happy to discuss your needs and assist in those efforts. You can book a free discovery call through this link.

If you enjoyed this post, subscribe to our publication or sign up for our newsletter.

Read more

You might also be interested in these

Introduction to LTV

This is the second blog post in a series covering churn and customer lifetime value.

Introduction to Churn

A high-level overview which will help you to analyze customer churn from scratch.