Optimizing Marketing Campaigns

Data-driven marketing has become essential for businesses to maximize their return on investment (ROI) and effectively reach their target audience.

By leveraging the power of BigQuery ML and the rich data available in Google Analytics, marketers can gain valuable insights and build predictive models to optimize their campaigns.

This post will guide you through the process of using the Google Analytics Sample Dataset in BigQuery to optimize marketing campaigns.

Contents

  1. Preparing the Data
  2. Understanding the Models
    • Logistic Regression
    • Random Forest
    • XGBoost
    • Deep Neural Networks (DNN)
  3. Model Performance Comparison
  4. Using the Random Forest Model for Campaign Optimization
  5. Conclusion

Preparing the Data

First, let’s prepare our data by creating a view that aggregates key metrics for each traffic source:

SQL
CREATE OR REPLACE VIEW `your-project.your-dataset.campaign_performance` AS
SELECT
  DATE(PARSE_DATE('%Y%m%d', date)) AS date,
  trafficSource.source AS source,
  trafficSource.medium AS medium,
  trafficSource.campaign AS campaign,
  COUNT(DISTINCT fullVisitorId) AS users,
  SUM(totals.transactions) AS transactions,
  SUM(totals.transactionRevenue) / 1000000 AS revenue,
  SUM(totals.pageviews) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20170101' AND '20170331'
GROUP BY
  date, source, medium, campaign;

It aggregates daily performance metrics for each unique combination of source, medium, and campaign.

You can find the complete code in my GitHub repository.

Understanding the Models

In our analysis of marketing campaign performance, we employed four different types of machine learning models. Each of these models has unique characteristics that make them suitable for predicting campaign success.

1. Logistic Regression

Logistic Regression is a statistical method used for predicting binary outcomes. In our context, it’s predicting whether a campaign will be successful (1) or not (0).

  • How it works: It estimates the probability that an instance belongs to a particular category.
  • Strengths: Simple, interpretable, and provides insight into the impact of each feature on the outcome.
  • Marketing use case: Identifying key factors that contribute to campaign success and estimating success probability.

2. Random Forest

Random Forest is an ensemble learning method that operates by constructing multiple decision trees and outputting the mode of the classes (classification) or mean prediction (regression) of the individual trees.

  • How it works: It creates many decision trees on randomly selected data samples, gets prediction from each tree, and selects the best solution by means of voting.
  • Strengths: Handles non-linear relationships well, less prone to overfitting, and can handle a large number of features.
  • Marketing use case: Predicting campaign success while considering complex interactions between various marketing factors.

3. XGBoost (Extreme Gradient Boosting)

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.

  • How it works: It builds trees sequentially, where each new tree corrects the errors of the previous ones.
  • Strengths: Often achieves state-of-the-art results on many machine learning challenges, handles missing data well.
  • Marketing use case: High-performance prediction of campaign success, especially when dealing with diverse and possibly messy marketing data.

4. Deep Neural Network (DNN)

Deep Neural Networks are complex machine learning models inspired by the human brain’s neural networks.

  • How it works: It consists of multiple layers of interconnected nodes, each layer learning to detect different features of the input data.
  • Strengths: Can learn very complex patterns and relationships in data, particularly effective with large amounts of data.
  • Marketing use case: Capturing intricate patterns in customer behavior and campaign performance that simpler models might miss.

Each of these models approaches the problem of predicting campaign success in a different way:

  • Logistic Regression looks for linear relationships between features and the outcome.
  • Random Forest creates multiple decision pathways and aggregates them.
  • XGBoost builds a series of trees, each focusing on correcting the mistakes of the previous ones.
  • Deep Neural Networks attempt to mimic human brain function to identify complex patterns.

Model Comparison and Analysis for Campaign Optimization

In our quest to optimize marketing campaigns, we implemented and compared four different machine learning models: Logistic Regression, Random Forest, XGBoost, and Deep Neural Network (DNN). Each of these models brings unique strengths to the task of predicting campaign success.

Model Performance Comparison

Here’s a summary of the performance metrics for each model:

ModelPrecisionRecallF1 ScoreAccuracyAUC
Random Forest0.9170.7330.8150.9680.943
XGBoost0.8630.7330.7930.9630.951
Deep Neural Network1.0000.6550.7910.9700.797
Logistic Regression1.0000.5170.6810.9540.922

Analysis of Model Performance

  1. Random Forest: This model demonstrates the best overall performance with the highest F1 score (0.815). It offers a good balance between precision and recall, suggesting it’s effective at identifying successful campaigns without too many false positives or negatives.
  2. XGBoost: Close behind Random Forest, XGBoost shows strong performance across all metrics. It has the highest AUC (0.951), indicating excellent ability to distinguish between successful and unsuccessful campaigns.
  3. Deep Neural Network (DNN): The DNN achieves perfect precision but at the cost of lower recall. This suggests it’s very conservative in its predictions, only flagging campaigns as successful when it’s very confident.
  4. Logistic Regression: Similar to the DNN, Logistic Regression shows perfect precision but the lowest recall. It’s the most conservative model, potentially missing many successful campaigns in its predictions.

Insights and Implications

  1. Balanced Performance: The Random Forest and XGBoost models offer the most balanced performance. They’re likely to be the most reliable for general campaign optimization tasks.
  2. Conservative Predictions: The DNN and Logistic Regression models are extremely precise but may be too conservative for many marketing scenarios. They could be useful when the cost of a false positive (predicting a campaign will be successful when it’s not) is very high.
  3. Recall vs. Precision Trade-off: There’s a clear trade-off between recall and precision among these models. The choice between them may depend on whether it’s more important to catch all potentially successful campaigns (high recall) or to be very confident in the campaigns predicted to be successful (high precision).
  4. High Accuracy Across Models: All models show high accuracy (>0.95), which is promising. However, this should be interpreted cautiously if the dataset is imbalanced.
  5. AUC Performance: XGBoost and Random Forest show the highest AUC scores, indicating they’re best at distinguishing between successful and unsuccessful campaigns across various thresholds.

Using the Random Forest Model for Campaign Optimization

Since the Random Forest Model outperformed other models, I am using the Random Forest Model to show an example of marketing campaign optimization.

The following evaluation provides an interpretation of the model’s prediction, explaining what the results mean and offering insights and recommendations based on those results.

Input Scenario

  • Day of Week: 5 (Friday)
  • Month: 4 (April)
  • Source: Google
  • Medium: CPC (Cost Per Click)
  • Campaign: spring_sale
  • Normalized Users: 1.0 (1 standard deviation above mean)
  • Normalized Pageviews: 1.0 (1 standard deviation above mean)

Prediction Results

  1. Predicted Outcome (predicted_made_purchase): 1 This indicates that the model predicts a purchase will be made under these conditions.
  2. Prediction Probability:
  • Probability of purchase (1): 0.5931 (59.31%)
  • Probability of no purchase (0): 0.4069 (40.69%)

Interpretation

  1. Positive Prediction: The model predicts that a purchase is more likely than not for this campaign scenario.
  2. Confidence Level: The model’s confidence in this prediction is moderate. With a 59.31% probability for a purchase, it’s leaning towards a positive outcome, but there’s still a substantial 40.69% chance of no purchase.
  3. Decision Threshold: The default decision threshold appears to be 0.5. Since the probability of purchase (0.5931) is above 0.5, the model predicts a positive outcome (1).
  4. Input Feature Impact:
  • The campaign being run on a Friday in April, via Google CPC, seems to have a positive influence.
  • Both normalized users and pageviews being one standard deviation above the mean suggests higher than average traffic, which appears to contribute to the positive prediction.

Insights and Recommendations

  1. Campaign Timing: Running the “spring_sale” campaign on a Friday in April appears to be a good choice.
  2. Traffic Source: Google CPC seems to be an effective channel for this campaign.
  3. Traffic Volume: Higher than average user numbers and pageviews (1 std dev above mean) contribute to the likelihood of purchase.
  4. Moderate Confidence: While the model predicts a purchase, the confidence is not overwhelmingly high. This suggests room for optimization.
  5. Risk Assessment: With a 40.69% chance of no purchase, there’s still a significant risk to consider.
  6. Further Testing: It might be worth testing variations of this campaign setup to see if you can increase the probability of purchase. For example, you could try different days of the week or adjust the campaign parameters.
  7. Threshold Adjustment: Depending on your business goals, you might consider adjusting the decision threshold. If you want to be more certain of purchases, you might set a higher threshold (e.g., 0.7). If you want to catch more potential purchasers at the risk of more false positives, you might lower it.

Remember, while this prediction is favorable, it’s based on a single scenario. In practice, you’d want to run predictions on various scenarios to optimize your campaign strategy fully.

Conclusion

The process of optimizing marketing campaigns through machine learning models offers significant potential for improving ROI and targeting effectiveness.

By leveraging BigQuery ML and Google Analytics data, I’ve demonstrated how to prepare campaign data, build and compare different predictive models, and apply these insights to real-world marketing scenarios.

The analysis revealed that while all models performed well, the Random Forest and XGBoost models provided the most balanced and reliable predictions for campaign success.

The practical application of these models, as illustrated with the Random Forest prediction example, shows how data-driven insights can inform specific campaign decisions.

From timing and channel selection to traffic volume considerations, these predictions offer actionable guidance for marketers.

However, it’s important to remember that these models are tools to augment, not replace, marketing expertise. Continuous testing, refinement, and integration of model insights with human knowledge will be key to maximizing the value of this approach in marketing campaign optimization.

RSS
Follow by Email
LinkedIn
Share