Tutorial

11 min read

How to predict Subscription Churn: key elements of building a churn model

Despite the era of GenAI hype, classical machine learning is still alive! Personally, I used to use ChatGPT (e.g. for idea generation), however I recently stopped. Thus, I believe OpenAI also needs (or probably already uses) a churn model to predict which customers will stop using their services. Not only can they predict the probability of churn for a particular user, but also, with the help of model explanation tools, find reasons as to why people get dissatisfied with their tool and then enhance it by debugging or developing new features.

In this article I will guide you through the key elements of building a churn model from a business perspective (mainly, as there will be a few technical/machine learning tips). You will discover what the main challenges are when defining what churn means for you, how business people are crucial in the process of creating features for a machine learning model, and how to translate these business thoughts on the reasons for churn into numbers.

According to studies:

Acquiring a new customer is anywhere from 5 to 25 times more expensive than retaining an existing one (Harvard Business Review).
A 5% increase in customer retention can boost profits by up to 75% (Bain & Company).
The probability of selling to an existing customer is 60-70%, whilst the probability of selling to a new business prospect is 5-20% (Forbes, quoting the book Marketing Metrics).
Repeat customers spend 67% more than new customers (BIA Advisory Services)
As many as 44% of companies still don’t calculate their customer retention rate (CustomerGauge)

Definition of churn

One of the main challenges when building a machine learning model predicting churn is the definition of churn itself. It may seem easy at first glance, but sometimes it can take weeks to finally decide what the definition should be - especially in big corporations or banks. The reasoning behind this is:

business units and different people can understand churn differently
the definition should be a mathematical formula, based on selected datasets and sometimes there are a few “sources of truth”
there are multiple business exceptions that, depending on the organisation, should not be treated as churn:
- customer just a few days late in paying the bill
- customer had a trial product
- customer passed away
- subscription was fraudulent and it was ceased intentionally by the organisation
- etc.
there are technical exceptions:
- data quality issues (e.g. part of subscriptions have wrong start or end dates)
- some subscriptions are technical or are being used by employees

Another important dimension of the definition is time.

Consider the two variants below:

Probability that the customer will churn on the last day of subscription

A. calculated on any day of the subscription

B. calculated 30 days before last day of subscription

These two variants have an immense impact on the way variables will be built and what the training set should look like. Version A is more informative as we can track the subscription score day to day and observe how it changes due to various events the customer generates. However, it’s more complex to implement and requires more data points.

Version B generates scores for subscriptions that are about to expire, making the training set more homogenous and generating scores for subscriptions just at a time when customer service can try to persuade the customer to prolong them. There are some other variants possible as well, and it always needs to be a joint decision together with the customer as to which one to choose.

As you can imagine, the list above can often bring about lots of challenges when defining churn in the organisation. This is why some of them choose to have multiple definitions of churn, track them and use them in separate machine learning models. However, using a single definition is the easiest road when it comes to model development, business understanding of the results and maintenance of the solution - including running and evaluating marketing campaigns).

All of these factors show that there is no golden rule when it comes to implementing a churn machine learning model in an organisation - each company is different and requires a custom-made solution to meet their business needs.

A feature brainstorming workshop is the key to success

Without good quality features, even the most robust machine learning model will not perform well. Here, a feature brainstorming workshop comes in handy. To make the most of it, multiple business units need to take part: sales, marketing, customer service, IT and data specialists. When stakes are high (reducing churn will bring lots of money), it’s useful to organise a bigger meeting and brainstorm over the subject for a few hours. The result of such a meeting should be:

List of data sources that can be used in churn modelling
a. crucial (e.g. subscription data)
b. nice-to-have (e.g. inbound calls to customer service)
List of data sources that cannot be used in the project at the moment, but have information that seems important when predicting churn. This unavailability can be caused by:
a. Historical data that has not been collected
b. A key to match records from the source to our customer/subscription not being available (but is possible to develop such an identifier)
c. Data not being collected at all (e.g. transcription of inbound calls to customer service or even recordings of such)
List of customer behaviours/features (in business language) that impact the decision making when it comes to subscription renewal (both positively and negatively), e.g.:

“when a customer is dissatisfied with our service, they will not prolong their subscription”
“some of our customers are students that only need the subscription for a few months to learn how to use it”
“last autumn there was a huge campaign with discounts from our competitor”
“due to inflation, we needed to impose a 50% price increase for subscription renewal starting this January”
etc.

Multiple definitions of churn - according to the previous paragraph.

On a side note, such a workshop is also a great opportunity to meet the team and people who are interested in the results of the project. It will benefit future cooperation and finally the whole solution quality.

Translate behaviour to statistics: feature engineering

The next step is reflecting the outcome of the brainstorming in data using statistical aggregations. In my opinion, this is one of the most interesting parts of a Data Scientist’s job - describing reality with numbers and trying to be as close as possible. Let’s try to create the features that would cover the ideas from the previous paragraph:

“when a customer is dissatisfied with our service, they will not prolong their subscription”
1.1 Count of incoming customer calls within last year
1.2 Count of customer emails complaining about our services within last 6 months
“some of our customers are students that only need the subscription for a few months to learn how to use it”
2.1 If the registration e-mail contains any university address (e.g. @uw.edu.pl)
2.2 If they are a declared student (e.g. marked during registration of a new customer)
“last autumn there was a huge campaign with discounts from our competitor”
3.1 The daily competitor’s price of a similar product
3.2 Simple indicator: if the competitor has a campaign with discounts
“due to inflation, we needed to impose a 50% price increase for subscription renewal starting this January”
4.1 Price of the product paid by the customer last year
4.2 Price of the product that the customer will need to pay to renew the subscription

Sometimes we need to be very creative to try to reflect the information in data, as often we don't have such historical data available. In this case, you need to treat such ideas as a trigger to start collecting more data sources that you find useful for your business.

The most important things when creating variables:

Be wary of data leakage

No information “from the future” should be provided in the model. For example, if we know that a subscription churned on 1st April 2023 (this was the last day of the subscription) and we want our model to predict the event a month earlier, we need to use the information that was available on 1st March 2023. Below please find two examples of variables where the first one leaks information from the future and the second one does not:

❌ Total count of products purchased last calendar year (meaning whole 2023)

✅ Count of products purchased by the customer during the last 12 months (calculated on 1st March 2023)

Create variables that are stable in time

When a distribution of the variable changes in time (by its design, not because of behavioural changes), there will always be data drift and the model will start performing worse. In the examples below, the first variable will always grow for customers that registered many years ago; the second variable will be rather stable, if our offer stays the same

❌ Total count of products purchased by the customer

✅ Count of products purchased by the customer during the last 12 months

Watch out for using categorical variables with fast changing levels
If you want to use categorical variables that you believe will be a good predictor for churn, investigate how fast they change. It is safer to use a higher level from product categorization than the product id itself. Imagine that last year the subscription plan sold at a particular price was sold with a different id than the subscription this year. There will not be such a product id in the scoring dataset. However, if the company provides products from a few categories, e.g. Streaming Services Subscription, Insurance Subscription, Advertising Subscription, etc., it’s safe to use such categories in feature engineering:

❌ Average value of active product_id=87673

✅ Average value of active Advertising Subscriptions

When it comes to technology, I personally recommend preparing such variables in the form of multiple marts (and not just making transformations as a pipeline to generate scores). For that, I used the dbt framework, where the code is just SQL, some yaml configuration and a bit of Python. With that solution, results of our work - if properly documented - can be used by other team members for their machine learning models or some insightful dashboards for management.

The rest is classical machine learning

Now you can run your first machine learning models. A few recommendations from my side:

Test your model results on the following out-of-time sample (check here)
Go with LightGBM - it should do the work when implemented correctly
Use Optuna to tune the hyperparameters of your model
Don’t forget about MLOps to get the model quickly to production

Running anti-churn campaigns

The machine learning model itself won’t stop customers from cancelling their subscriptions - what you need is a plan on how to use it to maximise the business outcome. And it’s best to start such planning just at the beginning of a churn modelling project.

Last but not least, when executing a marketing campaign, the best way is to run it using A/B tests. This way you will be able to:

Validate machine learning model performance
Correlate the probability scores with the impact of the campaign itself (what percentage of customers with scores of 0.7-0.8 did not churn compared to those with scores of 0.8-0.9
Test tiered discount depths depending on the churn probability score

Also, if possible, ask your customer service representatives to comment on the scores - they know customers and somehow have a feeling as to which ones are more likely to churn and which ones will definitely prolong their subscription.

Such feedback, along with thorough analysis of the campaign’s results, will help you to enhance your model with additional features or eliminate bugs that cause wrong predictions.

Summary

Churn modelling is still a valuable contributor when it comes to diminishing churn in your customer base. When done wisely, it can give you the tools to make your retention campaigns more sophisticated and effective.

If you have any questions or require a deeper understanding, sign up for a free consultation with our experts, and don’t forget to subscribe to our newsletter for more updates.

machine learning

Data Science

predictive analytics

anti-churn model

Last updated: 19 July 2024

Written by

Adrian Dembek

Senior Data Scientist

Like this post?
Spread the word

Want more? Check our articles

Tutorial

Flink with a Metadata Catalog

Have you worked with Flink SQL or Flink Table API? Do you find it frustrating to manage sources and sinks across different projects or repositories…

Success Stories

Customer Story: Platform focused on centralizing data sources and democratization of data with ING

The client who needs Data Analytics Platform ING is a global bank with a European base, serving large corporations, multinationals and financial…

getindata intelligent health modern data platform story 2

Success Stories

How the GID Modern Data Platform’s good practices help us address Intelligent Health data analytics needs in 6 weeks?

Can you build an automated infrastructure setup, basic data pipelines, and a sample analytics dashboard in the first two weeks of the project? The…

Tutorial

Data Mesh as a proper way to organise data world

Data Mesh as an answer In more complex Data Lakes, I usually meet the following problems in organizations that make data usage very inefficient: Teams…

big data technology warsaw summit 2020 getindata

Big Data Event

Review of presentations on the Big Data Technology Warsaw Summit 2020

It’s been exactly two months since the last edition of the Big Data Technology Warsaw Summit 2020, so we decided to share some great statistics with…

Radio DaTa Podcast

Data & analytics at Acast, AI & trends in the podcasting industry

In this episode of the RadioData Podcast, Adama Kawa talks with Jonas Björk from Acast. Mentioned topics include: analytics use cases implemented at…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

How to predict Subscription Churn: key elements of building a churn model

Definition of churn

A feature brainstorming workshop is the key to success

Translate behaviour to statistics: feature engineering

The rest is classical machine learning

Running anti-churn campaigns

Summary

Like this post?Spread the word

Want more? Check our articles

Flink with a Metadata Catalog

Customer Story: Platform focused on centralizing data sources and democratization of data with ING

How the GID Modern Data Platform’s good practices help us address Intelligent Health data analytics needs in 6 weeks?

Data Mesh as a proper way to organise data world

Review of presentations on the Big Data Technology Warsaw Summit 2020

Data & analytics at Acast, AI & trends in the podcasting industry

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!