Back to Portfolio
Healthcare

Heart Stroke Prediction

Binary classification model predicting stroke risk from patient health indicators — focused on high recall and clinical interpretability.

94%
Recall on Stroke Cases
SHAP
Explainability Framework
Random Forest
Best Performing Model

Problem Statement

Stroke is a leading cause of death and disability worldwide. Early identification of at-risk patients enables preventative interventions. Clinical datasets are typically imbalanced — strokes are rare (~5% of cases) — requiring careful handling of class imbalance and a focus on recall to avoid missing true stroke cases.

Technical Approach

Model Development

Feature Engineering

Engineered 30+ features from patient health records: body mass index, average glucose level, age, hypertension status, heart disease history, smoking status, and work type. Created interaction features between age and glucose, and between BMI and hypertension, which significantly improved model performance. Identified top predictors: age (strongest signal), average glucose level, and hypertension status.

SHAP Explainability

Built an interactive SHAP dashboard for clinical use. Each prediction is accompanied by a waterfall plot showing exactly which factors pushed the risk score up or down. This transformed the model from a black box into a tool clinicians could trust and act on.

Key Results

Tech Stack

PythonScikit-learnRandom ForestSMOTESHAPPandasMatplotlibJupyter
Back to Portfolio View on GitHub