Data Scientist | AI/ML + Applied Analytics | MSBA @ Emory | Ex-EY
I build end-to-end ML systems and data-driven solutions — from 30M+ row production-flavored pipelines to client-deployed operational frameworks. Currently graduating from Emory's MSBA program (May 2026), with 3+ years of prior enterprise consulting experience at EY that grounds my technical work in real business context.
What I focus on:
- 📊 End-to-end ML systems — feature engineering, calibration, leakage-safe validation, model evaluation
- 🎯 Business translation — turning models into priority matrices, action agendas, and quantified ROI
- 🤖 Applied AI — GenAI workflows, LLM-driven decisioning, AI-enabled analytics
- 🏢 Stakeholder delivery — executive communication, cross-functional collaboration, client-ready outputs
Languages: Python · SQL · R
ML / Data Science: scikit-learn · LightGBM · XGBoost · CatBoost · Keras
Big Data & Cloud: Spark · Hive · MongoDB · MySQL · AWS (S3, EC2, EMR)
Visualization: Tableau · Power BI · Streamlit
AI / GenAI: Claude · ChatGPT · prompt engineering · applied AI workflows
| Project | Domain | Highlight |
|---|---|---|
| CTR Prediction Pipeline | Ad-tech / Personalization | Production-flavored ML pipeline on 32M observations · Log loss 0.382 (11.4% improvement) · feature hashing at 2²² |
| Repair Lead Time Optimization | Operations Analytics | Two-level LightGBM on 1.6M records · Holdout AUC 0.809 · 28-cell priority matrix targeting ~85K improvable cases annually |
| Quant XGBoost Stock Returns | Quantitative Finance | 25-year out-of-sample backtest · Sharpe 2.53 · alpha t-stat 13.24 · monthly alpha +4.36% |
| Walmart Big Data Architecture | Data Engineering | Multi-DB stack (MySQL + MongoDB + Hive) · Streamlit sustainability recommender |