Back to Portfolio
Healthcare

Medical Cost Prediction

Regression model predicting individual insurance costs — identifying key cost drivers and deployed as an interactive Streamlit dashboard.

0.86
R² Score
4 Models
Compared & Tuned
Streamlit
Deployment

Problem Statement

Health insurance costs vary dramatically across individuals, and understanding cost drivers is essential for insurers, employers, and policyholders. The challenge: build an accurate regression model that predicts individual medical costs and, crucially, explains which factors drive those costs — enabling what-if analysis for policy design and personal health decisions.

Technical Approach

Exploratory Analysis & Feature Engineering

The dataset contained demographic and health information for 1,300+ individuals. Key features: age, sex, BMI, number of children, smoking status, and region. Exploratory analysis revealed a powerful interaction effect: smoking multiplied the effect of BMI on costs — smokers with high BMI had 4x the costs of non-smokers with the same BMI. This interaction term became the single most predictive feature in the final model.

Model Selection

Streamlit Deployment

Built an interactive what-if dashboard using Streamlit. Users adjust sliders for age, BMI, smoking status, and children to see real-time cost predictions. The dashboard displays feature contribution breakdowns showing exactly how each factor impacts the predicted cost. Deployed as a self-contained Python app — no frontend code required.

Key Results

Tech Stack

PythonXGBoostScikit-learnStreamlitPandasSeabornMatplotlib
Back to Portfolio View on GitHub