Customer Segmentation｜Coding Crossroads

Understanding your customer base at a deeper level is essential for developing targeted marketing strategies, improving customer experiences, and driving business success.

Customer segmentation is a powerful method for achieving this understanding. By grouping customers based on shared characteristics and behaviors, businesses can create personalized experiences that better meet the needs of each segment, leading to more effective engagement and increased customer loyalty.

In this post, we’ll explore how to use BigQuery ML to create features for effective customer segmentation, apply K-Means and PCA + Clustering to identify distinct customer segments, and evaluate the quality of these segments.

Data Preparation
Applying Clustering Algorithms
- K-Means Clustering
- PCA + K-Means Clustering
Understanding Cluster Characteristics
Evaluating Customer Segments
Conclusion

Data Preparation

Feature creation is a critical step in any customer segmentation analysis, as the quality and relevance of the features directly influence the effectiveness of the segmentation. The goal is to derive meaningful characteristics from raw data that can help distinguish between different customer segments. In the provided code, the features are derived from the Google Analytics sample dataset, focusing on key metrics that capture various aspects of customer behavior, engagement, and demographics.

SQL

-- Create Customer Features Table
CREATE OR REPLACE TABLE `predictive-behavior-analytics.Section6.customer_features` AS
SELECT
  CONCAT(fullVisitorId, CAST(visitId AS STRING)) AS customer_id,
  SUM(IFNULL(totals.transactionRevenue, 0)) / 1000000 AS total_revenue, 
  COUNT(totals.transactionRevenue) AS transaction_count,
  AVG(IFNULL(totals.transactionRevenue, 0)) / 1000000 AS avg_transaction_value, 
  MAX(totals.timeOnSite) AS max_session_duration,
  MIN(totals.timeOnSite) AS min_session_duration,
  AVG(totals.timeOnSite) AS avg_session_duration,
  device.deviceCategory AS device_type,
  geoNetwork.country AS country
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170731'
GROUP BY
  customer_id, device_type, country;

Breakdown of Features Created

Customer Identifier:
- customer_id: This feature uniquely identifies each customer by combining the fullVisitorId and visitId. This ensures that each customer is tracked consistently across their interactions, enabling accurate aggregation of their behaviors and transactions.
Revenue-Related Features:
- total_revenue: The total revenue generated by each customer, calculated by summing up the transactionRevenue over all sessions for each customer. This is a critical feature as it directly reflects the monetary value a customer brings to the business, making it essential for identifying high-value customers.
- transaction_count: The total number of transactions a customer has made, providing insight into their purchasing frequency. This feature can help distinguish between frequent buyers and those who make fewer, potentially higher-value purchases.
- avg_transaction_value: The average value of each transaction, calculated by dividing the total revenue by the transaction count. This feature helps identify customers who make high-value purchases compared to those who might purchase frequently but with lower transaction values.
Engagement-Related Features:
- max_session_duration: The longest time a customer has spent in a session. This can indicate the maximum level of engagement a customer has shown, which might correlate with interest in the content or products offered.
- min_session_duration: The shortest session duration for a customer. This feature could help identify customers with minimal engagement or those who might visit the site frequently for very specific purposes.
- avg_session_duration: The average session duration, giving a general sense of how much time a customer typically spends per visit. This feature can help identify highly engaged customers who consistently spend more time on the site.
Demographic Features:
- device_type: The type of device (e.g., desktop, mobile, tablet) used by the customer. This feature is crucial for understanding customer preferences in terms of technology and can help tailor marketing strategies to different device users.
- country: The country from which the customer is accessing the site. Geographic location is an important demographic feature that can influence customer behavior and preferences, making it a valuable input for segmentation.

Importance of These Features

Each of these features plays a significant role in the customer segmentation process:

Revenue-Related Features: These are vital for understanding the financial contribution of each customer and identifying high-value segments. They help in distinguishing between customers who generate high revenue through frequent or high-value purchases.
Engagement-Related Features: These features provide insights into how customers interact with the site. High engagement often correlates with higher loyalty and the potential for future purchases, making it a key factor in segmentation.
Demographic Features: Understanding the demographic makeup of your customer base allows for more targeted marketing strategies. Device type and geographic location can significantly influence how customers interact with the site and what products or services they prefer.

Applying Clustering Algorithms

In this section, we explore two clustering approaches to segment customers based on their behaviors and characteristics: K-Means Clustering on original features and PCA + K-Means Clustering. These methods help identify distinct customer segments that can be targeted with tailored marketing strategies.

1. K-Means Clustering

K-Means Clustering is a popular algorithm for partitioning data into distinct clusters based on the similarity of data points.

In this analysis, we apply K-Means directly to the original customer features to identify natural groupings among customers.

SQL

-- K-Means Clustering on Original Features
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section6.kmeans_customer_segmentation_revised`
OPTIONS(model_type='kmeans', num_clusters=5) AS
SELECT
  total_revenue,
  transaction_count,
  avg_transaction_value,
  avg_session_duration
FROM
  `predictive-behavior-analytics.Section6.customer_features`;

-- Retrieving Centroid Values
SELECT
  centroid_id,
  MAX(CASE WHEN feature = 'total_revenue' THEN numerical_value END) AS avg_total_revenue,
  MAX(CASE WHEN feature = 'transaction_count' THEN numerical_value END) AS avg_transaction_count,
  MAX(CASE WHEN feature = 'avg_transaction_value' THEN numerical_value END) AS avg_transaction_value,
  MAX(CASE WHEN feature = 'avg_session_duration' THEN numerical_value END) AS avg_session_duration
FROM
  ML.CENTROIDS(MODEL `predictive-behavior-analytics.Section6.kmeans_customer_segmentation_revised`)
GROUP BY
  centroid_id
ORDER BY
  avg_total_revenue DESC;

-- Assign Clusters to Customers for K-Means on Original Features
CREATE OR REPLACE TABLE `predictive-behavior-analytics.Section6.customer_clusters` AS
SELECT
  customer_id,
  CENTROID_ID AS cluster_id
FROM
  ML.PREDICT(MODEL `predictive-behavior-analytics.Section6.kmeans_customer_segmentation_revised`,
    (
      SELECT
        customer_id,
        total_revenue,
        transaction_count,
        avg_transaction_value,
        avg_session_duration
      FROM
        `predictive-behavior-analytics.Section6.customer_features`
    )
  );

2. PCA + K-Means Clustering

PCA + K-Means Clustering is an advanced approach that first applies Principal Component Analysis (PCA) to reduce the dimensionality of the feature space before clustering. This method can enhance the clustering process by focusing on the most critical aspects of customer behavior, which are captured in the principal components.

SQL

-- Apply PCA to Customer Features
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section6.pca_customer_features`
OPTIONS(model_type='pca', num_principal_components=3) AS
SELECT
  total_revenue,
  transaction_count,
  avg_transaction_value,
  avg_session_duration
FROM
  `predictive-behavior-analytics.Section6.customer_features`;

-- Retrieve the Principal Components 

CREATE OR REPLACE TABLE `predictive-behavior-analytics.Section6.pca_transformed_features` AS
SELECT
  customer_id,
  principal_component_1,
  principal_component_2,
  principal_component_3
FROM
  ML.PREDICT(MODEL `predictive-behavior-analytics.Section6.pca_customer_features`,
    (
      SELECT
        customer_id,
        total_revenue,
        transaction_count,
        avg_transaction_value,
        avg_session_duration
      FROM
        `predictive-behavior-analytics.Section6.customer_features`
    )
  );

  -- K-Means Clustering on PCA-Transformed Features
CREATE OR REPLACE MODEL `predictive-behavior-analytics.Section6.kmeans_pca_customer_segmentation`
OPTIONS(model_type='kmeans', num_clusters=5) AS
SELECT
  principal_component_1,
  principal_component_2,
  principal_component_3
FROM
  `predictive-behavior-analytics.Section6.pca_transformed_features`;

  -- Retrieving Centroid Values for PCA + K-Means
SELECT
  centroid_id,
  MAX(CASE WHEN feature = 'principal_component_1' THEN numerical_value END) AS avg_principal_component_1,
  MAX(CASE WHEN feature = 'principal_component_2' THEN numerical_value END) AS avg_principal_component_2,
  MAX(CASE WHEN feature = 'principal_component_3' THEN numerical_value END) AS avg_principal_component_3
FROM
  ML.CENTROIDS(MODEL `predictive-behavior-analytics.Section6.kmeans_pca_customer_segmentation`)
GROUP BY
  centroid_id
ORDER BY
  avg_principal_component_1 DESC;

-- Assign Clusters to Customers for PCA + K-Means
CREATE OR REPLACE TABLE `predictive-behavior-analytics.Section6.customer_clusters_pca` AS
SELECT
  customer_id,
  CENTROID_ID AS cluster_id
FROM
  ML.PREDICT(MODEL `predictive-behavior-analytics.Section6.kmeans_pca_customer_segmentation`,
    (
      SELECT
        customer_id,
        principal_component_1,
        principal_component_2,
        principal_component_3
      FROM
        `predictive-behavior-analytics.Section6.pca_transformed_features`
    )
  );

You can find the complete code in my GitHub repository.

Assigning Clusters to Customers

Once the clustering models have been trained, the next crucial step is to assign each customer to a specific cluster. This process allows us to categorize customers based on their behaviors and characteristics, enabling more targeted marketing strategies and personalized customer experiences.

Importance of Cluster Assignment

Assigning clusters to customers is a vital step in customer segmentation. It not only categorizes customers into meaningful groups but also serves as the foundation for subsequent analyses and actions. By understanding the characteristics of each cluster, businesses can:

Tailor Marketing Strategies: Design targeted campaigns that resonate with the specific needs and preferences of each customer segment.
Enhance Customer Engagement: Offer personalized experiences that improve customer satisfaction and loyalty.
Optimize Resource Allocation: Focus efforts and resources on high-value customer segments to maximize return on investment (ROI).

Understanding Cluster Characteristics

K-Means Clustering on Original Features

The K-Means clustering applied directly to the original features produced the following centroids:

Results

Cluster	Type	Avg Total Revenue	Avg Session Duration
Cluster 1	High-value customers	$10,109	1,909 seconds
Cluster 2	Mid-range customers	$993	1,517 seconds
Cluster 3	Low-end customers	$79	1,037 seconds

Cluster 1 (High-value customers): Ave. Total Revenue $10,109

This cluster represents high-value customers who have made large transactions. The long session duration suggests they spend a significant amount of time on the site per visit.

Cluster 2 (Mid-range customers): Ave. Total Revenue $993

This cluster represents mid-range customers who make moderate transactions. The transaction value is significantly lower than Cluster 1, but the session duration is still relatively high.

Cluster 3 (Low-end customers): Ave. Total Revenue $79

These customers represent the lower end of the spectrum, making small transactions. The shorter session duration also reflects less engagement compared to higher-value clusters.

PCA + K-Means Clustering

When PCA was applied before K-Means clustering, the centroids were represented in terms of principal components:

Results

Centroid 2:
Avg Principal Component 1: 256.71
Avg Principal Component 2: 86.92
Avg Principal Component 3: -6.01

This centroid is significantly larger in the first two principal components, suggesting that the corresponding cluster captures the most variance in the data. It likely represents the most distinct and impactful customer segment.

Centroid 1:
Avg Principal Component 1: 27.14
Avg Principal Component 2: 0.46
Avg Principal Component 3: -0.07

This cluster is smaller but still distinct, representing customers who differ moderately from others.

Centroids 3,4,5:

Values close to zero These centroids suggest that the corresponding clusters are less distinct or represent customers whose behaviors do not vary significantly from the overall population.

Comparison of K-Means vs. PCA + K-Means

Cluster Interpretability:

K-Means on Original Features: The centroids directly represent customer behaviors, making it easier to interpret the characteristics of each cluster. For example, high total revenue and transaction value directly indicate high-value customers.
PCA + K-Means: The clusters are defined in terms of principal components, which can be more challenging to interpret. However, PCA reduces dimensionality and might capture underlying patterns more effectively.

Cluster Distinctiveness:

K-Means on Original Features: The clusters are highly distinct, with significant differences in revenue and transaction values across centroids.
PCA + K-Means: The clusters are distinct in terms of principal components, but the interpretation of what these components represent is less straightforward.

Variance Capture:

K-Means on Original Features: The clustering is based on the original features, potentially leading to overfitting or less generalizable clusters.
PCA + K-Means: PCA helps capture the most significant variance in the data before clustering, potentially leading to more generalizable and robust clusters.

Conclusion

Both K-Means and PCA + K-Means have their strengths.

K-Means on Original Features is more interpretable, with centroids that directly reflect customer behaviors.

In contrast, PCA + K-Means may capture more nuanced patterns in the data, but at the cost of interpretability.

Depending on the business need—whether it’s understanding customer behavior directly or focusing on variance reduction—one method may be preferred over the other.

Evaluating Customer Segments

Silhouette Score Overview

The silhouette score is a metric used to evaluate the quality of clusters created by clustering algorithms like K-Means. It measures how similar an object is to its own cluster compared to other clusters. The score ranges from -1 to 1, where:

A score close to 1 indicates that the data points are well-matched to their own cluster and poorly matched to neighboring clusters.
A score close to 0 indicates that the data points are on or very close to the decision boundary between clusters.
A negative score suggests that the data points may have been assigned to the wrong cluster.

Results Comparison

K-Means Clustering on Original Features:

Silhouette Score: 0.7139

The silhouette score of approximately 0.714 for K-Means on the original features suggests that the clusters are reasonably well-formed, with most data points being appropriately assigned to their respective clusters.

However, the score is not very close to 1, indicating that there is still some overlap between clusters or that the clusters may not be optimally separated.

PCA + K-Means Clustering:

Silhouette Score: 0.9082

The silhouette score of approximately 0.908 for PCA + K-Means clustering is significantly higher than that of K-Means on the original features.

This high score indicates that the clusters are well-separated and that the data points are much better matched to their respective clusters.

The use of PCA before clustering appears to have enhanced the cluster separability, leading to more distinct and well-defined clusters.

Interpretation

Cluster Quality:

The higher silhouette score for PCA + K-Means indicates superior cluster quality compared to K-Means on the original features.

This suggests that applying PCA before clustering not only reduced the dimensionality but also captured the most significant variance in the data, leading to more coherent clusters.

Dimensionality Reduction:

By reducing the dimensionality with PCA, the data becomes more compact and focused on the most important features, which helps the K-Means algorithm perform better.

The PCA transformation likely helped to eliminate noise and irrelevant features, which could have contributed to the improved clustering performance.

Interpretability:

While PCA + K-Means offers better clustering performance, the interpretation of clusters based on principal components can be more challenging compared to clusters formed directly on the original features.

However, if the primary goal is to achieve the best possible clustering, PCA + K-Means is the preferable method based on these results.

Conclusion

The comparison of silhouette scores clearly indicates that PCA + K-Means is a more effective clustering approach in this scenario. It leads to better-defined clusters with less overlap and greater internal coherence.

This method is particularly useful when the goal is to enhance the quality of clustering, even if it comes at the cost of some interpretability due to the transformation of original features into principal components.

On the other hand, if interpretability of the original features is critical, you might prefer the K-Means clustering directly on the original features, despite the slightly lower clustering quality.

Contents

Data Preparation

Breakdown of Features Created

Importance of These Features

Applying Clustering Algorithms

1. K-Means Clustering

2. PCA + K-Means Clustering

Assigning Clusters to Customers

Importance of Cluster Assignment

Understanding Cluster Characteristics

K-Means Clustering on Original Features

Results

PCA + K-Means Clustering

Results

Comparison of K-Means vs. PCA + K-Means

Conclusion

Evaluating Customer Segments

Silhouette Score Overview

Results Comparison

K-Means Clustering on Original Features:

PCA + K-Means Clustering:

Interpretation

Conclusion