7.7M Records Across 49 States
SafeRoute Analytics: A Spatio-Temporal Accident Mitigation Suite
Project Summary
SafeRoute Analytics is a decision-ready geospatial intelligence system designed to convert 7.7 million traffic accident records into actionable insight for insurance providers, logistics firms, and urban planners. Instead of optimizing only for speed, SafeRoute focuses on safety by using machine learning and clustering to predict accident severity, identify high-liability zones, and support lower-risk routing decisions.
๐ผ The Business Problem & Opportunity
- The Problem: Traffic accidents cause unpredictable liability for insurance companies and costly delivery delays for logistics firms.
- The Opportunity: A business-driven model can quantify risk based on real-time variables (weather, infrastructure density, time of day) to inform premium pricing and logistical rerouting.
โ๏ธ Solution Architecture & Data Strategy
I engineered a rigorous data pipeline to transform raw, noisy public data into a refined safety tool:
- High-Scale Engineering: Ingested 7.7M records across 49 states.
- Quality Control: Implemented IQR-based outlier detection for environmental variables and removed features with >44% missing values to ensure model integrity.
- Feature Engineering: Converted raw timestamps into “Business Features” (e.g.,
Is_Weekend,Start_Hour) to capture commute-specific risk patterns.
The “Severity Engine” (XGBoost)
After evaluating multiple models, XGBoost was selected for its robustness.
- Performance: 66% overall accuracy.
- Recall: 90% recall for identifying low-severity incidents (Class 0), essential for high-volume logistics filtering.
๐ Key Business Insights & ROI
I. Risk-Prone Temporal Windows
Analysis revealed that 8 AM and 5 PM on weekdays are the highest-risk periods.
- Action: Logistics companies can use this to adjust driver shifts or implement “safety surcharges” during peak-risk windows.
II. Infrastructure & Liability (The “Signal Ratio”)
Using HDBSCAN clustering, I identified that areas with lower traffic signalization ratios often see higher severity (Avg. Severity 2.64).
- Action: City planners can prioritize infrastructure spend (signals/signage) in these specific low-signal/high-severity zones to reduce public liability.
| Top Clusters | Accident Count | Avg Severity | Traffic Signal Ratio |
|---|---|---|---|
| Cluster 36 | 86,124 | 2.64 | 0.188 |
| Cluster 14 | 16,587 | 2.60 | 0.092 |
๐จ The Product: “Accident Precaution Assistant”
The final output is an interactive Streamlit/Tableau Dashboard designed for real-time operational use:
- Risk Filtering: Filter hazards by City, Day, and Hour to identify historical “hazardous corridors.”
- AI Safety Advisor: An integrated LLM-powered assistant that converts model outputs into natural language driving tips for fleet drivers.
๐ Strategic Recommendations
- For Insurance: Implement dynamic pricing in High-Severity Clusters (36 and 14).
- For Logistics: Embed the Accident Precaution Assistant API into GPS systems to provide “Hazardous Corridor” alerts during morning rush hours.