Tomomi’s AI and Machine Learning Portfolio

I’m Tomomi Tanaka, an economist deeply committed to advancing the field of artificial intelligence with a strong focus on online safety.

This blog is my space to share my journey, expertise, and innovative projects in AI and machine learning, all with a particular emphasis on creating safer, more responsible digital environments.

What sets this blog apart is my unique approach to breaking down complex concepts through practical, real-world applications, especially in the context of online safety.

Whether you’re a beginner aiming to enter the field or an experienced professional seeking advanced insights, you’ll find valuable content tailored to your needs.

Why I Created This Blog

When I was Director of Safety by Design at Match Group, I became aware of a large knowledge gap between safety experts and engineers. Safety by Design experts need to evaluate all products and features, including advanced AI. However, safety experts often do not have the knowledge of AI to evaluate AI products and features. I was frequently asked how they could learn AI. This realization was a turning point for me. I decided to leave Match Group and focus on sharing my knowledge of AI and machine learning with safety experts, empowering them to bridge this gap.

This blog is organized into five key sections, each highlighting a different aspect of AI and machine learning, with a special emphasis on safety and responsible deployment. Here’s what you can expect:

Price Prediction with Python

In this series, I delve into regression techniques to predict house prices using Python, leveraging the popular “House Prices – Advanced Regression Techniques” dataset from Kaggle. This series is not just about building accurate predictive models—it’s also about ensuring these models are interpretable and explainable, which is crucial for trust and safety experts working to ensure responsible AI deployment.

What You’ll Learn

The series covers the entire machine learning pipeline, from data cleaning to model deployment, with a special emphasis on model interpretability and explainability. Here’s what you can expect:

Each post in this series provides a thorough understanding of the techniques involved, complete with Python code snippets, and full GitHub repositories to illustrate key concepts.

User Behavior Analytics with BigQuery ML

Welcome to the “User Behavior Analytics with SQL” blog series! Designed for data analysts, marketers, and business intelligence professionals, this series equips you with the skills to perform advanced user behavior analysis and predictive modeling using SQL in BigQuery.

What’s This Series About?

Understanding user behavior is vital for business success. In this series, we’ll explore key topics ranging from basic metrics to advanced predictive modeling techniques, all within BigQuery’s scalable environment. You’ll learn how to transform raw e-commerce data into actionable insights that drive strategic decisions.

Analyzing User Behavior on an E-commerce Site
Deep Dive into User Engagement Analysis
Sales Prediction
- Logistic Regression
- Random Forest
- XGBoost
- Deep Neural Network (DNN)
Revenue Prediction
- Linear Regression
- Ridge Regression
- Lasso Regression
- Random Forest
Identifying High-Value Customers
- Logistic Regression
- K-Means Clustering
- Random Forest
Customer Segmentation
- K-Means Clustering
- PCA + K-Means Clustering
Predicting User Conversion
- Logistic Regression Model
- Random Forest Model
- XGBoost Model
Churn Prediction
- Logistic Regression Model
- Random Forest Model
- XGBoost Model
Recommendation and Personalization
- Matrix Factorization Model
Optimizing Marketing Campaigns
- Logistic Regression
- Random Forest
- XGBoost
- Deep Neural Networks (DNN)

Each post will cover a specific aspect of user behavior analysis, from foundational metrics to advanced predictive modeling. By the end of this series, readers will have a solid understanding of how to leverage SQL and BigQuery for data-driven decision-making in the context of e-commerce.

User Behavior Analytics with Python

I’m revisiting the comprehensive analyses I initially conducted using SQL, this time harnessing the enhanced flexibility and power of Python to push the boundaries of what’s possible.

While SQL provided a strong foundation for understanding user behavior, its limitations became apparent when addressing more complex, predictive tasks.

Python, with its extensive libraries and tools, enables us to move beyond basic queries and delve deeper into sophisticated machine learning models.

Machine Learning Safety

In this critical section, I focus on ensuring the safety, reliability, and ethical use of machine learning systems. Topics include:

Fairness, Bias Detection and Mitigation (AIF360 library, Reweighing, Disparate Impact)
Model Explainability and Interpretability (SHAP, LIME, PDP)
Reliability and Robustness (Adversarial training)
Ethical Considerations (Calibrated Equalized Odds, Mitigation)
Adversarial Robustness (PGD attack, Adversarial Training with PGD, TRADES, Randomized Smoothing)
Privacy-Preserving Machine Learning (Privacy-Preserving Machine Learning)
Scalable Oversight of AI Systems (Recursive Reward Modeling, Debate and Amplification Techniques, Factored Cognition Approaches, Human-AI Interaction Protocols)
Ethical AI Development (AI Development Lifecycle)

I’ll provide practical examples of implementing safety measures in Python, and discuss real-world implications of these concepts.

Generative AI Safety

In this cutting-edge section, I explore the critical realm of Generative AI safety, addressing the challenges and opportunities presented by this revolutionary technology. From ChatGPT’s conversational abilities to DALL-E’s artistic creations, generative AI is reshaping our world. This series aims to equip you with the knowledge to navigate the complex landscape of generative AI safety.

Key topics include:

Each post offers in-depth analysis, practical strategies for implementing safety measures, and discussions on real-world implications. Whether you’re a developer, policymaker, or AI enthusiast, this series provides valuable insights into the critical field of generative AI safety.