Predicting Marketing-Driven Sales

December 1, 2024

Machine Learning XGBoost Python SHAP Feature Engineering

Project Summary

This project is a decision-ready marketing analytics and machine learning workflow designed to predict sales conversion rates and compare channel effectiveness. The original tech-industry dataset had low variance, so I re-scoped the analysis toward a broader multi-industry dataset to create a more reliable model for channel-level decision-making.

💡 The Solution: Data Pivot & Feature Engineering

To solve the variance issue, I pivoted the project to a more diverse dataset spanning multiple industries (Digital Marketing Campaigns for SMEs), ensuring the model could generalize better.

Key Technical Actions:

Feature Engineering: Created interaction terms like ad_spend_engagement to capture the synergy between budget and audience behavior.
Advanced Modeling: Implemented and compared Linear Regression, Random Forest, and XGBoost.
Model Explainability: Used SHAP (SHapley Additive exPlanations) to determine which features actually drove conversions, moving beyond “black box” predictions.

📊 Performance Results

The models achieved near-perfect accuracy on the refined dataset:

Random Forest: $R^2$ = 0.999
XGBoost: $R^2$ = 0.997

Note: Near-perfect R² reflects the refined, multi-industry dataset selected to address low variance in the original data. Results were validated with cross-validation to confirm generalizability and rule out data leakage.

🎨 Visualizing the Impact

I created custom visualizations using Matplotlib and Seaborn to translate these complex results into actionable insights for non-technical stakeholders, specifically ranking channels based on ROI.

🛠️ Technologies Used