Churn Prediction｜Coding Crossroads

Understanding and predicting customer churn is crucial for retaining valuable users and optimizing marketing strategies.

Churn prediction allows businesses to identify users at risk of leaving, enabling proactive efforts to retain them.

In this post, we’ll explore how to develop features, build and train predictive models, and evaluate their performance using BigQuery ML and the Google Analytics Sample Dataset.

Feature Development
Building and Training Predictive Models
- Logistic Regression Model
- Random Forest Model
- XGBoost Model
Confusion Matrix Results
Performance Comparison
Comparison of Weighted and Non-Weighted Churn Prediction Models

1. Feature Development

For our churn prediction model, we’ll need to define what constitutes a “churned” customer and develop features that might be predictive of churn. Let’s consider a customer as churned if they haven’t visited the site in the last 30 days of our dataset.

I’ve created several features that might be predictive of churn:

churned: Our target variable (1 if churned, 0 if not)
num_visits: Number of visits by the user
avg_time_on_site: Average time spent on the site per visit
avg_pageviews: Average number of pages viewed per visit
total_transactions: Total number of transactions made by the user
device_category: The user’s primary device type
country: The user’s country
traffic_medium: The primary traffic source medium
total_pageviews: Total number of pages viewed across all visits
total_time_on_site: Total time spent on the site across all visits
days_as_customer: Number of days between the user’s first and last visit

2. Building and Training Predictive Models

Now that we have our features, let’s build three different types of models to predict churn: Logistic Regression, Random Forest, and XGBoost. These models are well-suited for binary classification problems like churn prediction.

Logistic Regression Model

SQL

-- Logistic Regression Model for Churn
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section8.churn_logistic`
OPTIONS(model_type='logistic_reg', input_label_cols=['churned']) AS
SELECT
  * EXCEPT(fullVisitorId)
FROM
  `predictive-behavior-analytics.Section8.user_churn_features`;

Random Forest Model

SQL

-- Random Forest Model for Churn
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section8.churn_random_forest`
OPTIONS(model_type='random_forest_classifier', input_label_cols=['churned']) AS
SELECT
  * EXCEPT(fullVisitorId)
FROM
  `predictive-behavior-analytics.Section8.user_churn_features`;

XGBoost Model

SQL

-- XGBoost Model for Churn
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section8.churn_xgboost`
OPTIONS(model_type='boosted_tree_classifier', input_label_cols=['churned']) AS
SELECT
  * EXCEPT(fullVisitorId)
FROM
  `predictive-behavior-analytics.Section8.user_churn_features`;

3. Confusion Matrix Results

Let’s generate confusion matrices for each model to understand their performance:

Confusion matrices provide a detailed breakdown of the classification results by showing the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Here’s an analysis of the confusion matrices for Logistic Regression, Random Forest, and XGBoost models.

Logistic Regression

The confusion matrix for the Logistic Regression model reveals significant limitations. With 58,728 true negatives and only 1 false positive, the model is highly conservative, rarely predicting churn. However, it struggles with identifying actual churners, evidenced by 1,792 false negatives and just 2 true positives. While effective at predicting non-churners, the model is ineffective for churn prediction, missing most at-risk users and thus not suitable for retaining them.

	Predicted: 0 (not converted)	Predicted: 1 (converted)
Actual: 0 (not converted)	(TN) 58,728	(FP) 1
Actual: 1 (converted)	(FN) 1,792	(TP) 2

Random Forest

The confusion matrix for the Random Forest model shows some improvement over Logistic Regression but still highlights challenges in predicting churn.

The model correctly identifies 58,729 non-churners (true negatives), demonstrating its reliability in predicting customers who will stay. However, it still struggles with churn prediction, correctly identifying only 12 actual churners (true positives) while missing 1,782 churners (false negatives).

While the Random Forest model is slightly better at detecting churners compared to Logistic Regression, it still misses the majority of those at risk of leaving, suggesting that further tuning or a different model might be needed to improve churn prediction.

	Predicted: 0 (not converted)	Predicted: 1 (converted)
Actual: 0 (not converted)	(TN) 58,729	(FP) 0
Actual: 1 (converted)	(FN) 1,782	(TP) 12

XGBoost

The XGBoost model shows a balanced performance in churn prediction, although it still faces challenges in accurately identifying churners.

The model correctly predicts 58,727 non-churners (true negatives) and misclassifies only two non-churners as churners (false positives), demonstrating a high accuracy in identifying those who will stay.

However, it struggles with churn prediction, correctly identifying only 5 actual churners (true positives) while missing 1,789 churners (false negatives).

	Predicted: 0 (not converted)	Predicted: 1 (converted)
Actual: 0 (not converted)	(TN) 58,727	(FP) 2
Actual: 1 (converted)	(FN) 1,789	(TP) 5

Conclusion

Based on the confusion matrix analysis, Random Forest emerges as the top-performing model for predicting user conversion due to its superior balance between identifying actual conversions and minimizing false predictions.

XGBoost is a viable alternative but may require adjustments, while Logistic Regression is the least effective in this context, particularly due to its high false negative rate.

Overall, while Random Forest performs better than the other two models, none effectively predicts churn. This suggests that further model tuning or exploring alternative approaches is necessary to better capture at-risk customers and enhance churn prediction efforts.

4. Performance Comparison

	Logistic Regression	Random Forest	XGBoost
Precision	0.667	1.000	0.714
Recall	0.001	0.007	0.003
Accuracy	0.970	0.971	0.970
F1 Score	0.002	0.013	0.006
Log Loss	0.125	0.195	0.124
AUC	0.696	0.597	0.686

In evaluating the performance of the three models for churn prediction, several key metrics provide insights into their effectiveness.

Precision measures how often positive predictions are correct, while recall assesses how well the model identifies actual positives. Accuracy gives an overall correctness measure, but can be misleading in imbalanced datasets. F1 Score balances precision and recall. Log Loss penalizes incorrect predictions, and AUC shows the model’s ability to distinguish between classes.

$Precision = \frac{TP}{TP + NP}$

$Recall = \frac{TP}{TP + FN}$

$Accuracy}= \frac{TP + TN}{TP + TN + FP + FN}$

$F1 Score}= \frac{2TP}{2TP + FP + FN}$

1. Precision

Logistic Regression (0.667)
Random Forest (1.000)
XGBoost (0.714)

Random Forest leads with perfect precision, meaning that all customers predicted as churners were indeed churners. XGBoost follows with a precision of 71.4%, and Logistic Regression is slightly lower at 66.7%. High precision is crucial when false positives (incorrectly predicting churn) are costly, but it’s important to balance this with recall.

2. Recall

Logistic Regression (0.001)
Random Forest (0.007)
XGBoost (0.003)

All models struggle with recall, which measures how well the models identify actual churners. Random Forest performs the best at 0.67%, but this is still extremely low. XGBoost comes next with 0.28%, and Logistic Regression has the lowest recall at 0.11%. Low recall indicates that these models are missing a large number of actual churners, limiting their effectiveness in retention strategies.

3. Accuracy

Logistic Regression (0.970)
Random Forest (0.971)
XGBoost (0.970)

Accuracy is high across all models, around 97%, indicating that they correctly classify most customers as non-churners. However, given the imbalanced nature of churn prediction (where non-churners are far more common), high accuracy alone is not a sufficient measure of model effectiveness.

4. F1 Score

Logistic Regression (0.002)
Random Forest (0.013)
XGBoost (0.006)

The F1 score, which balances precision and recall, is very low for all models. Random Forest, while having perfect precision, achieves a higher F1 score (0.013) due to slightly better recall. XGBoost’s F1 score (0.006) reflects its better balance between precision and recall compared to Logistic Regression, which has the lowest F1 score at 0.002. Low F1 scores indicate that the models are not effectively balancing precision and recall.

5. Log Loss

Logistic Regression (0.125)
Random Forest (0.195)
XGBoost (0.124)

Log loss is particularly useful for understanding how well the model’s predicted probabilities align with actual outcomes.

XGBoost achieves the lowest log loss, indicating high confidence in its predictions. Logistic Regression follows closely, while Random Forest has the highest log loss, suggesting that despite its high precision, it is less confident in its predictions.

6. AUC

Logistic Regression (0.696)
Random Forest (0.597)
XGBoost (0.686)

Logistic Regression has the highest AUC at 0.696, indicating it is slightly better at distinguishing between churners and non-churners overall. XGBoost is close behind at 0.686, while Random Forest trails with an AUC of 0.597, suggesting it is the least effective in separating churners from non-churners.

Conclusion

Overall, Random Forest excels in precision but struggles with recall, leading to a poor F1 score.

XGBoost offers the best log loss, indicating confidence, while Logistic Regression provides the highest AUC, showing a slightly better ability to distinguish between classes.

However, the low recall across all models suggests that none are effectively capturing churners, which is critical for this type of predictive task.

5. Class Imbalance

Understanding Class Imbalance

In predictive modeling, particularly in binary classification tasks, class imbalance is a common issue.

Class imbalance occurs when the number of instances in one class (e.g., non-converters) significantly exceeds the number of instances in the other class (e.g., converters).

This imbalance can lead to models that perform well overall (e.g., high accuracy) but fail to correctly predict the minority class (e.g., predicting converters).

In the context of conversion, this issue is particularly pronounced.

Impact of Class Imbalance on Model Performance

Class imbalance can skew several performance metrics:

Accuracy: While a model might achieve high accuracy by predicting the majority class (non-converters), this metric can be misleading because it doesn’t account for the model’s ability to predict the minority class (converters).
Precision and Recall: Precision might be low if the model incorrectly classifies non-converters as converters. Recall can be particularly low if the model fails to identify actual converters.
F1 Score: This metric, which balances precision and recall, is often impacted the most by class imbalance, as both precision and recall tend to suffer when the minority class is underrepresented.

6. Comparison of Weighted and Non-Weighted Churn Prediction Models

In order to correct class impalance, I adjusted class weights to penalize false negatives more heavily.

The introduction of class weights in the Logistic Regression and XGBoost models has led to notable changes in performance metrics, reflecting an improved focus on addressing the class imbalance in the dataset. (Unfortunately, BigQuery ML doesn’t support class weights for Random Forest models).

Here’s a discussion of the improvements compared to the non-weighted versions:

Logistic Regression

Metric	Non-Weighted	Weighted
Precision	0.667	0.097
Recall	0.001	0.258
Accuracy	0.970	0.907
F1 Score	0.002	0.141
Log Loss	0.125	0.469
AUC	0.696	0.696

Precision: Decreased from 0.667 to 0.097. The model is less precise but identifies more potential churners.
Recall: Significant improvement from 0.001 to 0.258. The weighted model is much better at identifying churned customers.
Accuracy: Slightly decreased but still high at 0.907.
F1 Score: Improved from 0.002 to 0.141, indicating a better balance between precision and recall.
Log Loss: Increased, suggesting less confident predictions.
AUC: Remained the same, indicating similar overall discriminative ability.

XGBoost

Metric	Non-Weighted	Weighted
Precision	0.714	0.074
Recall	0.003	0.389
Accuracy	0.970	0.838
F1 Score	0.006	0.125
Log Loss	0.124	0.491
AUC	0.686	0.685

Precision: Decreased from 0.714 to 0.074. The model is less precise but identifies more potential churners.
Recall: Dramatic improvement from 0.003 to 0.389. The weighted model is far better at identifying churned customers.
Accuracy: Decreased but still reasonably high at 0.838.
F1 Score: Improved from 0.006 to 0.125, showing a better balance between precision and recall.
Log Loss: Increased, indicating less confident predictions.
AUC: Slightly decreased but essentially the same, suggesting similar overall discriminative ability.

Conclusion

The weighted models show a clear improvement in addressing the class imbalance problem, particularly in their ability to identify churned customers (higher recall). This comes at the cost of lower precision and overall accuracy, which is a common and often acceptable trade-off in churn prediction scenarios.

The choice between the weighted and non-weighted models depends on the specific business context:

If the cost of missing a potential churner (false negative) is high, the weighted models are preferable due to their higher recall.

If the cost of falsely identifying a customer as a potential churner (false positive) is high, the non-weighted models might be more suitable.

In many churn prediction scenarios, identifying more potential churners (even with some false positives) is preferable, as it allows for targeted retention efforts. Therefore, the weighted models, especially the XGBoost model with its high recall, might be more valuable for practical application in churn prevention strategies.

You can find the complete code in my GitHub repository.

Contents

1. Feature Development

2. Building and Training Predictive Models

Logistic Regression Model

Random Forest Model

XGBoost Model

3. Confusion Matrix Results

Logistic Regression

Random Forest

XGBoost

Conclusion

4. Performance Comparison

1. Precision

2. Recall

3. Accuracy

4. F1 Score

5. Log Loss

6. AUC

Conclusion

5. Class Imbalance

Understanding Class Imbalance

Impact of Class Imbalance on Model Performance

6. Comparison of Weighted and Non-Weighted Churn Prediction Models

Logistic Regression

XGBoost

Conclusion