Regression model predicting individual insurance costs — identifying key cost drivers and deployed as an interactive Streamlit dashboard.
Health insurance costs vary dramatically across individuals, and understanding cost drivers is essential for insurers, employers, and policyholders. The challenge: build an accurate regression model that predicts individual medical costs and, crucially, explains which factors drive those costs — enabling what-if analysis for policy design and personal health decisions.
The dataset contained demographic and health information for 1,300+ individuals. Key features: age, sex, BMI, number of children, smoking status, and region. Exploratory analysis revealed a powerful interaction effect: smoking multiplied the effect of BMI on costs — smokers with high BMI had 4x the costs of non-smokers with the same BMI. This interaction term became the single most predictive feature in the final model.
Built an interactive what-if dashboard using Streamlit. Users adjust sliders for age, BMI, smoking status, and children to see real-time cost predictions. The dashboard displays feature contribution breakdowns showing exactly how each factor impacts the predicted cost. Deployed as a self-contained Python app — no frontend code required.