XGBoost + SHAP model explains which channels actually drive conversions
Predicting Marketing-Driven Sales
Project Summary
This project is a decision-ready marketing analytics and machine learning workflow designed to predict sales conversion rates and compare channel effectiveness. The original tech-industry dataset had low variance, so I re-scoped the analysis toward a broader multi-industry dataset to create a more reliable model for channel-level decision-making.
💡 The Solution: Data Pivot & Feature Engineering
To solve the variance issue, I pivoted the project to a more diverse dataset spanning multiple industries (Digital Marketing Campaigns for SMEs), ensuring the model could generalize better.
Key Technical Actions:
- Feature Engineering: Created interaction terms like
ad_spend_engagementto capture the synergy between budget and audience behavior. - Advanced Modeling: Implemented and compared Linear Regression, Random Forest, and XGBoost.
- Model Explainability: Used SHAP (SHapley Additive exPlanations) to determine which features actually drove conversions, moving beyond “black box” predictions.
📊 Performance Results
The models achieved near-perfect accuracy on the refined dataset:
- Random Forest: $R^2$ = 0.999
- XGBoost: $R^2$ = 0.997
Note: Near-perfect R² reflects the refined, multi-industry dataset selected to address low variance in the original data. Results were validated with cross-validation to confirm generalizability and rule out data leakage.
🎨 Visualizing the Impact
I created custom visualizations using Matplotlib and Seaborn to translate these complex results into actionable insights for non-technical stakeholders, specifically ranking channels based on ROI.
🛠️ Technologies Used
Python (Pandas, NumPy, Scikit-learn, XGBoost, SHAP) | Google Colab | Tableau