Understanding and predicting customer churn is crucial for retaining valuable users and optimizing marketing strategies.
Churn prediction allows businesses to identify users at risk of leaving, enabling proactive efforts to retain them.
In this post, we’ll explore how to develop features, build and train predictive models, and evaluate their performance using BigQuery ML and the Google Analytics Sample Dataset.
Contents
- Feature Development
- Building and Training Predictive Models
- Logistic Regression Model
- Random Forest Model
- XGBoost Model
- Confusion Matrix Results
- Performance Comparison
- Comparison of Weighted and Non-Weighted Churn Prediction Models
1. Feature Development
For our churn prediction model, we’ll need to define what constitutes a “churned” customer and develop features that might be predictive of churn. Let’s consider a customer as churned if they haven’t visited the site in the last 30 days of our dataset.
I’ve created several features that might be predictive of churn:
- churned: Our target variable (1 if churned, 0 if not)
- num_visits: Number of visits by the user
- avg_time_on_site: Average time spent on the site per visit
- avg_pageviews: Average number of pages viewed per visit
- total_transactions: Total number of transactions made by the user
- device_category: The user’s primary device type
- country: The user’s country
- traffic_medium: The primary traffic source medium
- total_pageviews: Total number of pages viewed across all visits
- total_time_on_site: Total time spent on the site across all visits
- days_as_customer: Number of days between the user’s first and last visit
2. Building and Training Predictive Models
Now that we have our features, let’s build three different types of models to predict churn: Logistic Regression, Random Forest, and XGBoost. These models are well-suited for binary classification problems like churn prediction.
Logistic Regression Model
-- Logistic Regression Model for Churn
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section8.churn_logistic`
OPTIONS(model_type='logistic_reg', input_label_cols=['churned']) AS
SELECT
* EXCEPT(fullVisitorId)
FROM
`predictive-behavior-analytics.Section8.user_churn_features`;
Random Forest Model
-- Random Forest Model for Churn
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section8.churn_random_forest`
OPTIONS(model_type='random_forest_classifier', input_label_cols=['churned']) AS
SELECT
* EXCEPT(fullVisitorId)
FROM
`predictive-behavior-analytics.Section8.user_churn_features`;
XGBoost Model
-- XGBoost Model for Churn
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section8.churn_xgboost`
OPTIONS(model_type='boosted_tree_classifier', input_label_cols=['churned']) AS
SELECT
* EXCEPT(fullVisitorId)
FROM
`predictive-behavior-analytics.Section8.user_churn_features`;
3. Confusion Matrix Results
Let’s generate confusion matrices for each model to understand their performance:
Confusion matrices provide a detailed breakdown of the classification results by showing the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Here’s an analysis of the confusion matrices for Logistic Regression, Random Forest, and XGBoost models.
Logistic Regression
The confusion matrix for the Logistic Regression model reveals significant limitations. With 58,728 true negatives and only 1 false positive, the model is highly conservative, rarely predicting churn. However, it struggles with identifying actual churners, evidenced by 1,792 false negatives and just 2 true positives. While effective at predicting non-churners, the model is ineffective for churn prediction, missing most at-risk users and thus not suitable for retaining them.
Predicted: 0 (not converted) | Predicted: 1 (converted) | |
Actual: 0 (not converted) | (TN) 58,728 | (FP) 1 |
Actual: 1 (converted) | (FN) 1,792 | (TP) 2 |
Random Forest
The confusion matrix for the Random Forest model shows some improvement over Logistic Regression but still highlights challenges in predicting churn.
The model correctly identifies 58,729 non-churners (true negatives), demonstrating its reliability in predicting customers who will stay. However, it still struggles with churn prediction, correctly identifying only 12 actual churners (true positives) while missing 1,782 churners (false negatives).
While the Random Forest model is slightly better at detecting churners compared to Logistic Regression, it still misses the majority of those at risk of leaving, suggesting that further tuning or a different model might be needed to improve churn prediction.
Predicted: 0 (not converted) | Predicted: 1 (converted) | |
Actual: 0 (not converted) | (TN) 58,729 | (FP) 0 |
Actual: 1 (converted) | (FN) 1,782 | (TP) 12 |
XGBoost
The XGBoost model shows a balanced performance in churn prediction, although it still faces challenges in accurately identifying churners.
The model correctly predicts 58,727 non-churners (true negatives) and misclassifies only two non-churners as churners (false positives), demonstrating a high accuracy in identifying those who will stay.
However, it struggles with churn prediction, correctly identifying only 5 actual churners (true positives) while missing 1,789 churners (false negatives).
Predicted: 0 (not converted) | Predicted: 1 (converted) | |
Actual: 0 (not converted) | (TN) 58,727 | (FP) 2 |
Actual: 1 (converted) | (FN) 1,789 | (TP) 5 |
Conclusion
Based on the confusion matrix analysis, Random Forest emerges as the top-performing model for predicting user conversion due to its superior balance between identifying actual conversions and minimizing false predictions.
XGBoost is a viable alternative but may require adjustments, while Logistic Regression is the least effective in this context, particularly due to its high false negative rate.
Overall, while Random Forest performs better than the other two models, none effectively predicts churn. This suggests that further model tuning or exploring alternative approaches is necessary to better capture at-risk customers and enhance churn prediction efforts.
4. Performance Comparison
Logistic Regression | Random Forest | XGBoost | |
Precision | 0.667 | 1.000 | 0.714 |
Recall | 0.001 | 0.007 | 0.003 |
Accuracy | 0.970 | 0.971 | 0.970 |
F1 Score | 0.002 | 0.013 | 0.006 |
Log Loss | 0.125 | 0.195 | 0.124 |
AUC | 0.696 | 0.597 | 0.686 |
In evaluating the performance of the three models for churn prediction, several key metrics provide insights into their effectiveness.
Precision measures how often positive predictions are correct, while recall assesses how well the model identifies actual positives. Accuracy gives an overall correctness measure, but can be misleading in imbalanced datasets. F1 Score balances precision and recall. Log Loss penalizes incorrect predictions, and AUC shows the model’s ability to distinguish between classes.
1. Precision
- Logistic Regression (0.667)
- Random Forest (1.000)
- XGBoost (0.714)
Random Forest leads with perfect precision, meaning that all customers predicted as churners were indeed churners. XGBoost follows with a precision of 71.4%, and Logistic Regression is slightly lower at 66.7%. High precision is crucial when false positives (incorrectly predicting churn) are costly, but it’s important to balance this with recall.
2. Recall
- Logistic Regression (0.001)
- Random Forest (0.007)
- XGBoost (0.003)
All models struggle with recall, which measures how well the models identify actual churners. Random Forest performs the best at 0.67%, but this is still extremely low. XGBoost comes next with 0.28%, and Logistic Regression has the lowest recall at 0.11%. Low recall indicates that these models are missing a large number of actual churners, limiting their effectiveness in retention strategies.
3. Accuracy
- Logistic Regression (0.970)
- Random Forest (0.971)
- XGBoost (0.970)
Accuracy is high across all models, around 97%, indicating that they correctly classify most customers as non-churners. However, given the imbalanced nature of churn prediction (where non-churners are far more common), high accuracy alone is not a sufficient measure of model effectiveness.
4. F1 Score
- Logistic Regression (0.002)
- Random Forest (0.013)
- XGBoost (0.006)
The F1 score, which balances precision and recall, is very low for all models. Random Forest, while having perfect precision, achieves a higher F1 score (0.013) due to slightly better recall. XGBoost’s F1 score (0.006) reflects its better balance between precision and recall compared to Logistic Regression, which has the lowest F1 score at 0.002. Low F1 scores indicate that the models are not effectively balancing precision and recall.
5. Log Loss
- Logistic Regression (0.125)
- Random Forest (0.195)
- XGBoost (0.124)
Log loss is particularly useful for understanding how well the model’s predicted probabilities align with actual outcomes.
XGBoost achieves the lowest log loss, indicating high confidence in its predictions. Logistic Regression follows closely, while Random Forest has the highest log loss, suggesting that despite its high precision, it is less confident in its predictions.
6. AUC
- Logistic Regression (0.696)
- Random Forest (0.597)
- XGBoost (0.686)
Logistic Regression has the highest AUC at 0.696, indicating it is slightly better at distinguishing between churners and non-churners overall. XGBoost is close behind at 0.686, while Random Forest trails with an AUC of 0.597, suggesting it is the least effective in separating churners from non-churners.
Conclusion
Overall, Random Forest excels in precision but struggles with recall, leading to a poor F1 score.
XGBoost offers the best log loss, indicating confidence, while Logistic Regression provides the highest AUC, showing a slightly better ability to distinguish between classes.
However, the low recall across all models suggests that none are effectively capturing churners, which is critical for this type of predictive task.
5. Class Imbalance
Understanding Class Imbalance
In predictive modeling, particularly in binary classification tasks, class imbalance is a common issue.
Class imbalance occurs when the number of instances in one class (e.g., non-converters) significantly exceeds the number of instances in the other class (e.g., converters).
This imbalance can lead to models that perform well overall (e.g., high accuracy) but fail to correctly predict the minority class (e.g., predicting converters).
In the context of conversion, this issue is particularly pronounced.
Impact of Class Imbalance on Model Performance
Class imbalance can skew several performance metrics:
- Accuracy: While a model might achieve high accuracy by predicting the majority class (non-converters), this metric can be misleading because it doesn’t account for the model’s ability to predict the minority class (converters).
- Precision and Recall: Precision might be low if the model incorrectly classifies non-converters as converters. Recall can be particularly low if the model fails to identify actual converters.
- F1 Score: This metric, which balances precision and recall, is often impacted the most by class imbalance, as both precision and recall tend to suffer when the minority class is underrepresented.
6. Comparison of Weighted and Non-Weighted Churn Prediction Models
In order to correct class impalance, I adjusted class weights to penalize false negatives more heavily.
The introduction of class weights in the Logistic Regression and XGBoost models has led to notable changes in performance metrics, reflecting an improved focus on addressing the class imbalance in the dataset. (Unfortunately, BigQuery ML doesn’t support class weights for Random Forest models).
Here’s a discussion of the improvements compared to the non-weighted versions:
Logistic Regression
Metric | Non-Weighted | Weighted |
Precision | 0.667 | 0.097 |
Recall | 0.001 | 0.258 |
Accuracy | 0.970 | 0.907 |
F1 Score | 0.002 | 0.141 |
Log Loss | 0.125 | 0.469 |
AUC | 0.696 | 0.696 |
- Precision: Decreased from 0.667 to 0.097. The model is less precise but identifies more potential churners.
- Recall: Significant improvement from 0.001 to 0.258. The weighted model is much better at identifying churned customers.
- Accuracy: Slightly decreased but still high at 0.907.
- F1 Score: Improved from 0.002 to 0.141, indicating a better balance between precision and recall.
- Log Loss: Increased, suggesting less confident predictions.
- AUC: Remained the same, indicating similar overall discriminative ability.
XGBoost
Metric | Non-Weighted | Weighted |
Precision | 0.714 | 0.074 |
Recall | 0.003 | 0.389 |
Accuracy | 0.970 | 0.838 |
F1 Score | 0.006 | 0.125 |
Log Loss | 0.124 | 0.491 |
AUC | 0.686 | 0.685 |
- Precision: Decreased from 0.714 to 0.074. The model is less precise but identifies more potential churners.
- Recall: Dramatic improvement from 0.003 to 0.389. The weighted model is far better at identifying churned customers.
- Accuracy: Decreased but still reasonably high at 0.838.
- F1 Score: Improved from 0.006 to 0.125, showing a better balance between precision and recall.
- Log Loss: Increased, indicating less confident predictions.
- AUC: Slightly decreased but essentially the same, suggesting similar overall discriminative ability.
Conclusion
The weighted models show a clear improvement in addressing the class imbalance problem, particularly in their ability to identify churned customers (higher recall). This comes at the cost of lower precision and overall accuracy, which is a common and often acceptable trade-off in churn prediction scenarios.
The choice between the weighted and non-weighted models depends on the specific business context:
If the cost of missing a potential churner (false negative) is high, the weighted models are preferable due to their higher recall.
If the cost of falsely identifying a customer as a potential churner (false positive) is high, the non-weighted models might be more suitable.
In many churn prediction scenarios, identifying more potential churners (even with some false positives) is preferable, as it allows for targeted retention efforts. Therefore, the weighted models, especially the XGBoost model with its high recall, might be more valuable for practical application in churn prevention strategies.
You can find the complete code in my GitHub repository.