Recommendation and Personalization

Personalization and recommendation systems play a crucial role in enhancing user experience and driving business growth.

By leveraging the power of BigQuery ML and the rich data available in Google Analytics, we can create personalization and recommendation models.

This post will guide you through the process of building such a system using the Google Analytics Sample Dataset in BigQuery.

Contents

  1. Preparing the Data
  2. Building a Matrix Factorization Model
  3. Generating Recommendations
  4. Evaluating the Model
  5. Implementing Personalization
  6. Conclusion

Preparing the Data

To build an effective recommendation system, we need to prepare our data. Here’s a query to extract relevant features:

SQL
-- Step 1: Preparing the Data
CREATE OR REPLACE TABLE `predictive-behavior-analytics.Section9.user_item_interactions` AS
SELECT
  fullVisitorId,
  CONCAT(product.productSKU, '_', product.v2ProductName) AS item_id,
  SUM(product.productQuantity) AS interaction_count,
  SUM(product.productPrice * product.productQuantity) / 1000000 AS total_revenue
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  UNNEST(hits) AS hits,
  UNNEST(hits.product) AS product
WHERE
  _TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
  AND hits.eCommerceAction.action_type = '6'  -- Completed purchase
GROUP BY
  fullVisitorId, item_id
HAVING
  interaction_count > 0;

This query creates a table of user-item interactions, focusing on completed purchases.

Building a Matrix Factorization Model

Now that we have our data prepared, let’s use BigQuery ML to create a matrix factorization model for item recommendations:

SQL
-- Step 2: Building the Matrix Factorization Model
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section9.item_recommendation_model`
OPTIONS(
  model_type='MATRIX_FACTORIZATION',
  user_col='fullVisitorId',
  item_col='item_id',
  rating_col='interaction_count',
  feedback_type='implicit'
) AS
SELECT
  fullVisitorId,
  item_id,
  interaction_count
FROM
  `predictive-behavior-analytics.Section9.user_item_interactions`;

This model will learn latent factors for users and items based on their interactions.

Generating Recommendations

With our model trained, we can now generate personalized recommendations for users:

SQL
-- Step 3: Generating Recommendations
WITH user_item_pairs AS (
  SELECT DISTINCT
    ui1.fullVisitorId,
    ui2.item_id
  FROM
    `predictive-behavior-analytics.Section9.user_item_interactions` ui1
  CROSS JOIN
    (SELECT DISTINCT item_id FROM `predictive-behavior-analytics.Section9.user_item_interactions`) ui2
),
predictions AS (
  SELECT *
  FROM ML.PREDICT(MODEL `predictive-behavior-analytics.Section9.item_recommendation_model`,
    (SELECT * FROM user_item_pairs))
)
SELECT
  fullVisitorId AS user_id,
  ARRAY_AGG(STRUCT(item_id, predicted_interaction_count_confidence)
            ORDER BY predicted_interaction_count_confidence DESC
            LIMIT 5) AS top_5_recommendations
FROM predictions
GROUP BY fullVisitorId
LIMIT 10;

This query will generate top 10 recommended items for each user.

You can find the complete code in my GitHub repository.

Evaluating the Model

To assess the performance of the Matrix Factorization recommendation model, we utilized several key metrics. These metrics provide insights into different aspects of the model’s effectiveness in predicting user-item interactions and generating relevant recommendations.

Evaluation Metrics

  1. Mean Average Precision (MAP): 0.2503
  2. Mean Squared Error (MSE): 0.0417
  3. Normalized Discounted Cumulative Gain (NDCG): 0.9750
  4. Average Rank: 0.3338
Mean Average Precision (MAP)
  • Our model achieved a MAP of 0.2503, indicating that approximately 25% of the recommended items are relevant to users.
  • While there’s room for improvement, this score is reasonable for a recommendation system based on implicit feedback.
Mean Squared Error (MSE)
  • The low MSE of 0.0417 suggests that our model’s predictions closely align with actual user interactions.
  • This indicates high accuracy in predicting user-item interaction strengths.
Normalized Discounted Cumulative Gain (NDCG)
  • With an impressive NDCG of 0.9750, our model excels at ranking relevant items higher in the recommendation lists.
  • This high score implies that users are likely to find the most relevant items at the top of their recommendations.
Average Rank
  • The average rank of 0.3338 indicates that relevant items typically appear within the top third of recommendation lists.
  • This suggests that users won’t need to scroll far to find items of interest, enhancing the user experience.

Key Insights

  1. Ranking Effectiveness: The high NDCG and low Average Rank demonstrate that our model is particularly adept at prioritizing relevant items. This is crucial for user satisfaction in recommendation systems.
  2. Prediction Accuracy: The low MSE indicates that our model accurately predicts the strength of user-item interactions, which is valuable for understanding user preferences.
  3. Relevance vs. Ranking: While the MAP suggests moderate performance in terms of overall recommendation relevance, the high NDCG indicates that the most relevant items are consistently ranked at the top.
  4. User Experience: The combination of strong ranking performance and low average rank suggests that users are likely to find relevant items quickly, potentially leading to higher engagement.

Conclusion

Our Matrix Factorization model demonstrates robust performance, particularly in ranking relevant items and accurately predicting user-item interactions. While there’s room to improve overall precision, the model’s strength in prioritizing relevant recommendations suggests it will effectively enhance user experience and engagement in our recommendation system.

Implementing Personalization

Beyond item recommendations, we can use our model for broader personalization efforts:

  1. Personalized Email Campaigns: Use top recommendations for each user to create targeted email content.
  2. Website Personalization: Dynamically adjust product displays based on user preferences.
  3. Tailored Promotions: Offer discounts on items likely to interest specific users.

Conclusion

By leveraging BigQuery ML and Google Analytics data, we’ve created a powerful personalization and recommendation system. This approach allows for scalable, data-driven decision-making that can significantly enhance user experience and drive business growth.

Remember, the key to successful personalization is continuous iteration and testing. Regularly update your model with new data and experiment with different features to improve its performance over time.

RSS
Follow by Email
LinkedIn
Share