Skip to content
View SmritiGoyal's full-sized avatar

Block or report SmritiGoyal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SmritiGoyal/README.md

Hi, I'm Smriti 👋

Data Scientist | AI/ML + Applied Analytics | MSBA @ Emory | Ex-EY

I build end-to-end ML systems and data-driven solutions — from 30M+ row production-flavored pipelines to client-deployed operational frameworks. Currently graduating from Emory's MSBA program (May 2026), with 3+ years of prior enterprise consulting experience at EY that grounds my technical work in real business context.

What I focus on:

  • 📊 End-to-end ML systems — feature engineering, calibration, leakage-safe validation, model evaluation
  • 🎯 Business translation — turning models into priority matrices, action agendas, and quantified ROI
  • 🤖 Applied AI — GenAI workflows, LLM-driven decisioning, AI-enabled analytics
  • 🏢 Stakeholder delivery — executive communication, cross-functional collaboration, client-ready outputs

🛠️ Tech Stack

Languages: Python · SQL · R
ML / Data Science: scikit-learn · LightGBM · XGBoost · CatBoost · Keras
Big Data & Cloud: Spark · Hive · MongoDB · MySQL · AWS (S3, EC2, EMR)
Visualization: Tableau · Power BI · Streamlit
AI / GenAI: Claude · ChatGPT · prompt engineering · applied AI workflows

🎯 Featured Projects

Project Domain Highlight
CTR Prediction Pipeline Ad-tech / Personalization Production-flavored ML pipeline on 32M observations · Log loss 0.382 (11.4% improvement) · feature hashing at 2²²
Repair Lead Time Optimization Operations Analytics Two-level LightGBM on 1.6M records · Holdout AUC 0.809 · 28-cell priority matrix targeting ~85K improvable cases annually
Quant XGBoost Stock Returns Quantitative Finance 25-year out-of-sample backtest · Sharpe 2.53 · alpha t-stat 13.24 · monthly alpha +4.36%
Walmart Big Data Architecture Data Engineering Multi-DB stack (MySQL + MongoDB + Hive) · Streamlit sustainability recommender

📫 Connect

LinkedIn · smritie.goyal@gmail.com · smriti.goyal@emory.edu

Pinned Loading

  1. ctr-prediction-pipeline ctr-prediction-pipeline Public

    Production-style ML pipeline for click-through rate prediction on the Avazu dataset. 32M training rows, 13M test predictions. L2-regularized logistic regression with smoothed CTR encoding, frequenc…

    Python

  2. rtat-optimization rtat-optimization Public

    End-to-end ML pipeline predicting repair turn-around time for a Fortune 500 appliance manufacturer. 2.19M records, 41 leakage-audited features, 15 candidate models. Holdout AUC 0.809, MAE 4.60d (32…

    Python

  3. xgboost-quant-stock-return xgboost-quant-stock-return Public

    Rolling-window XGBoost cross-sectional return prediction model for US equities (1995-2024). Annualized Sharpe 2.53, monthly alpha 4.36% (t=13.24), market beta 0.73 over 300 months out-of-sample.

    Python