B

o

n

j

o

u

r

;

 

I

'

m

S

u

g

u

m

a

r

a

n

B

a

l

a

s

u

b

r

a

m

a

n

i

y

a

n

A

I

/

M

L

E

n

g

i

n

e

e

r

AI/ML Engineer with 7+ years of experience building production-grade machine learning systems on AWS and Azure. Focused on LLM-powered workflows, agentic AI, MLOps, and end-to-end ML systems that move from prototype to production.

Delivered measurable outcomes across supply chain, healthcare, and risk domains — including 21% cost reduction, 45% throughput improvement, and 81% AUC-ROC on multimodal healthcare prediction.

</AboutMe>

I'm an AI/ML Engineer with 7+ years of experience building production-grade data and machine learning systems across cloud, analytics, and enterprise environments. My recent work focuses on LLM-powered workflows, agentic AI, model deployment, and ML governance — systems that move from prototype to production.


I've built AWS-based data pipelines, implemented MLflow governance on SageMaker, deployed models with Docker and Kubernetes, and delivered measurable outcomes across supply chain, healthcare, and risk domains. Alongside industry work, I teach data analytics and machine learning at SKEMA Business School — bridging technical depth with real-world business impact.


Sugumaran Balasubramaniyan

</Experience>

Professor (External)
SKEMA Business School
Paris, Île-de-France, France | Jan 2026 - Present

• Lead instructor for undergraduate and graduate courses in Data Analytics, Business Intelligence, and SQL, with a focus on Power BI, Python, and machine learning applications in real-world business contexts.
• Design and deliver curriculum that bridges theoretical knowledge with hands-on industry applications, enabling students to develop end-to-end data solutions—from data extraction and modeling to visualization and storytelling.
• Mentor students in capstone projects and research initiatives, often collaborating with industry partners to solve practical challenges in marketing, finance, and customer analytics.
• Recognized for fostering an engaging and inclusive learning environment, consistently receiving high student satisfaction ratings for clarity, relevance, and applied teaching methods.
• Continuously update course content to reflect the latest tools and trends in data analytics, including SQL databases, cloud platforms (AWS/Azure), and AI-driven business intelligence.
• Certified in Microsoft Power BI Data Analyst Associate and AWS Machine Learning, integrating industry-recognized credentials and best practices into the academic experience.

Generative AI Data Scientist & ML Engineer
The Validate
Paris, Île-de-France, France | Jan 2024 - Present

AI-Powered Document Processing Pipeline
• Architected and deployed an end-to-end OCR pipeline leveraging n8n workflow automation, PyTorch for deep learning, and GPT-4 for intelligent data extraction.
• Achieved 45% reduction in manual data entry workload and recovered 20+ hours weekly for the operations team.
• Implemented error handling, data validation, and automated quality checks, ensuring 94% accuracy in document processing.
• Technologies: n8n, PyTorch, GPT-4, Python, Docker, REST APIs

Pharma Supply Chain Optimization
• Designed and built a full-stack supply chain optimization solution combining cloud data warehousing with an interactive web application.
• Improved forecast accuracy by 15% using SARIMA time-series models, enabling proactive inventory risk identification.
• Reduced stockout incidents by 12% through predictive analytics and a real-time alerting system.
• Technologies: Python, SQL, Tableau, AWS (S3, RDS), Snowflake, predictive modeling

MLOps & Deployment
• Containerized ML models using Docker and established CI/CD pipelines for automated model testing, validation, and deployment.
• Implemented model monitoring and drift detection, reducing model deployment time from 5 days to 6 hours.
• Technologies: Docker, Kubernetes, Git, CI/CD, MLflow, model versioning

Data Analyst / Data Engineer
Joubert-Associes
Paris, Île-de-France, France | Jul 2023 - Dec 2023

Executive Dashboard Development & SQL Optimization
• Designed and developed interactive Power BI dashboards tracking 50+ construction projects, providing leadership with real-time visibility into project health, timelines, and resource allocation.
• Optimized SQL query performance via indexes, window functions, and query refactoring — achieving 36% reduction in dashboard refresh time (8.5 min → 5.4 min).
• Enabled data-driven resource reallocation decisions that improved project delivery timelines by 14%.
• Technologies: Power BI, SQL Server, DAX, data modeling, ETL

Marketing Campaign Optimization through A/B Testing
• Planned, executed, and analyzed A/B tests for digital marketing campaigns targeting construction industry clients.
• Increased lead conversion rate from 8% to 12% within 3 months through data-driven campaign optimization.
• Technologies: Python (pandas, scipy), Excel, statistical analysis, hypothesis testing

Data Pipeline & Automation
• Automated weekly reporting workflows, reducing manual reporting time by 6 hours per week.
• Technologies: Python, SQL, Power BI, Jira, Confluence

Data Analyst Associate
Capgemini
Chennai, India | Apr 2021 - Jul 2022

Cloud-Based ETL Pipeline Development
• Architected and implemented automated ETL pipelines using AWS Glue and Python, processing 2TB+ of supply chain data daily.
• Accelerated data processing speed by 45% through optimized PySpark transformations and parallel processing.
• Eliminated 15 hours/week of manual data transformation work, freeing the team for higher-value analytics.

Machine Learning for Inventory Optimization (Client: Philip Morris International)
• Developed and deployed SARIMA and LSTM time-series forecasting models predicting demand across 150+ SKUs and 30+ distribution centers.
• Achieved $2.3M annual cost reduction (21% decrease) through optimized inventory management while maintaining 96% service levels.
• Reduced forecasting error from 18% to 7% through model refinement and hyperparameter tuning.
• Performed extensive feature engineering, creating 50+ predictive features from historical sales, seasonality, and external factors.

MLOps & Model Governance
• Established MLflow-based model governance framework to track experiments, manage versions, and ensure reproducibility.
• Implemented automated model monitoring and drift detection, reducing drift incidents by 18%.
• Deployed models to AWS SageMaker with automated retraining pipelines; managed 12+ production models with A/B testing and rollback.

Risk Modeling Data Analyst Executive
Infosys BPM
Chennai, India | Mar 2020 - Apr 2021

Risk Management & Credit Risk Analytics (Client: Citizens Bank, USA)
• Conducted comprehensive risk management analysis for a retail banking portfolio using Python-based statistical modeling and Monte Carlo simulations.
• Developed risk scoring models to predict loan default probability with 82% accuracy, enabling proactive risk mitigation strategies.
• Reduced portfolio risk exposure by 9% through a data-driven early warning system.
• Automated monthly risk reports using Python and SQL, reducing generation time from 2 days to 4 hours.
• Analyzed 10M+ records to identify trends, anomalies, and actionable insights for banking operations.

Team Leadership & Mentorship
• Mentored 2 junior analysts in Python, SQL best practices, and statistical analysis techniques.
• Improved team productivity and delivery timelines by 25% through knowledge sharing and collaborative problem-solving.

Manufacturing Analytics Specialist
Arimalytics
Pondicherry, India | Jun 2018 - Feb 2020

Predictive Maintenance & Manufacturing Analytics
• Built predictive models to forecast inventory requirements and machine failures, improving forecast accuracy by 12%.
• Enabled proactive maintenance scheduling that reduced unplanned downtime by 18% and extended equipment lifespan.
• Developed anomaly detection algorithms to identify equipment behavior deviations in real-time.

Dashboard Development & Business Intelligence
• Designed and deployed interactive Tableau dashboards visualizing production KPIs, risk indicators, and operational metrics.
• Reduced manual reporting time by 30% through automated data refresh and self-service analytics.
• Analyzed supply chain data to optimize inventory levels, reducing carrying costs by 8%.

</Education>

MSc Data Science and AI Strategy
emlyon business school
Paris, Île-de-France, France | Aug 2022 - Feb 2024

• This program uniquely bridged the gap between AI technology and business strategy, equipping me with the skills to design and deploy AI applications with a focus on responsible data governance and transparent practices. I gained a practical, action-oriented understanding of both the technical fundamentals and the human/business impacts of AI.

International Exchange Program
McGill University
Montréal, Québec, Canada | May 2023 - Jul 2023

• This program provided me with a comprehensive skillset in emerging technologies, including: developing and deploying Internet of Things (IoT) solutions, understanding North American business practices with a focus on Montreal's tech ecosystem, and designing and implementing advanced recommender systems. Through hands-on projects and theoretical study, I gained expertise in data analysis, hardware/software integration, and collaborative filtering techniques.

Post Graduate Program in Data Science
Great Learning
Chennai, India | July 2018 - Mar 2019

• This intensive program immersed me in key data science and analytics disciplines, including data analysis, machine learning (supervised and unsupervised), and text mining. I developed proficiency in essential tools and technologies like Python, R, Tableau, and database management, applying these skills through real-world industry case studies.

Master of Business Administration
Pondicherry University
Pondicherry, India | July 2016 - May 2018

• This specialized program offered in-depth training in key areas of Operations and Human Resources. I developed proficiency in Supply Chain Management, Operations Research, Service Operations Management, and Quality Management, alongside expertise in HR Analytics, Strategic Human Resource Management, and Human Resources Management. I also gained valuable knowledge in Strategic Management and Project Management.

Bachelor of Technology
Pondicherry University
Pondicherry, India | July 2012 - May 2016

• This four-year program equipped me with a deep understanding of mechanical engineering principles, covering a wide range of subjects including heat and mass transfer, kinematics, and automobile engineering. I developed proficiency in both theoretical concepts and practical applications, preparing me for roles in design, simulation, and control across various industries.

</Certs>

AWS Machine Learning Engineer Associate
Amazon Web Services

• Proficient in designing, implementing, and deploying machine learning solutions on AWS.
• Expertise in SageMaker, feature engineering, and model optimization.
• Skilled in ML pipeline orchestration and automated model training workflows.

AWS Cloud Practitioner
Amazon Web Services

• Validated expertise in AWS cloud architecture and foundational services.
• Demonstrated knowledge of AWS pricing models and cost optimization strategies.
• Proficient in deploying scalable and secure cloud infrastructure on AWS.

Databricks AI Agents
Databricks Academy

• Mastered building autonomous AI agents using Databricks platform.
• Implemented LLM-based agents for complex task automation and reasoning.
• Expertise in prompt engineering and agentic workflow orchestration.

Snowflake Data Warehousing
Snowflake University

• Proficient in designing and managing cloud-based data warehouses using Snowflake.
• Expertise in data modeling, query optimization, and Snowflake governance.
• Skilled in data sharing and Snowflake collaboration features.

AWS GenAI Practitioner
Amazon Web Services

• Expert in building generative AI applications using AWS services.
• Proficient with Amazon Bedrock, SageMaker JumpStart, and generative AI tools.
• Skilled in prompt engineering and responsible AI practices.

Dataiku ML Practitioner
Dataiku Academy

• Proficient in end-to-end machine learning projects using Dataiku platform.
• Expertise in visual machine learning workflows and automated model selection.
• Skilled in model deployment and monitoring within Dataiku ecosystem.

Dataiku Developer
Dataiku Academy

• Expert in developing custom plugins and extensions for Dataiku platform.
• Skilled in Python development within Dataiku recipe and custom component frameworks.
• Proficient in integrating external APIs and data sources with Dataiku.

Atlassian Agile Project Management Professional
Atlassian Academy

• Expert in agile project management using Jira and Confluence platforms.
• Proficient in sprint planning, backlog management, and team collaboration workflows.
• Skilled in implementing agile methodologies and scaling agile practices across teams.

Microsoft Power BI Data Analyst Associate (PL-300)
Microsoft

• Certified in designing and building scalable data models, cleaning and transforming data, and enabling advanced analytic capabilities in Power BI.
• Proficient in DAX, Power Query, and publishing reports and dashboards for business decision-making.
• Applied directly in academic and professional contexts, including teaching BI at SKEMA Business School.

NVIDIA Certified Professional: Agentic AI
NVIDIA

• Certified in designing and deploying production-grade agentic AI systems using LLMs and tool-calling frameworks.
• Proficient in multi-agent orchestration, memory management, and RAG pipeline architectures.
• Skilled in building reliable, evaluable AI agents for enterprise environments.

NVIDIA Certified Associate: Generative AI LLMs
NVIDIA

• Certified in the fundamentals of large language models, transformer architectures, and generative AI techniques.
• Proficient in fine-tuning, prompt engineering, and deploying LLM-based solutions.
• Skilled in applying generative AI to real-world NLP and multimodal use cases.

Databricks AI Agent Fundamentals
Databricks Academy

• Proficient in building and evaluating AI agents using the Databricks platform and Unity Catalog.
• Skilled in integrating LLMs with tool use, retrieval augmentation, and structured outputs.
• Experienced in deploying agent workflows for enterprise-scale data and AI pipelines.

</Languages>

English
Native or Bilingual
French
Professional Working
Tamil
Native or Bilingual
Hindi
Limited Working

</Skills>

Tech Stack

  • Python
  • PyTorch
  • R
  • Scala
  • Azure-SQL
  • MySQL
  • Redis
  • PostgresSQL
  • GITHUB
  • HuggingFace
  • GIT
  • Anaconda
  • Apache-Spark
  • Apache-Airflow
  • Apache-Hadoop
  • Apache-Cassandra
  • Apache-Kafka
  • AWS
  • Azure
  • GCP
  • Power-BI
  • Tableau
  • NumPy
  • Pandas
  • Scikit-learn
  • Matplotlib
  • Plotly
  • Streamlit
  • Flask
  • Docker
  • Kubernetes
  • TensorFlow
  • HTML5
  • CSS3
  • JavaScript
  • React
  • Node.js
  • MongoDB
  • GraphQL
  • Confluence
  • Jira
  • Excel
  • FastAPI
  • OpenCV
  • Databricks
  • Snowflake
  • Dataiku
  • MLflow
  • LangChain
  • LangGraph
  • n8n

</Projects>

Patient Mortality Rate and Readmission Prediction

Patient Mortality & Readmission Prediction

ML fusion models (XGBoost + BERT) on large healthcare datasets achieving AUC-ROC of 0.81. Built scalable AWS pipelines (Glue, Athena, Lambda, SageMaker, Bedrock) reducing latency by 30%.

Technical Approach

Built a multimodal ML fusion system combining structured clinical data (XGBoost) with unstructured clinical notes (BERT). Deployed on AWS using a serverless architecture with event-driven inference pipelines.

Key Results

  • Fusion model achieved AUC-ROC of 0.81, outperforming single-modal baselines by 12%
  • Reduced inference latency by 30% using SageMaker endpoint optimization
  • Processed 100K+ patient records through AWS Glue ETL pipelines

Tech Stack

XGBoostBERTAWS SageMakerAWS LambdaAWS GlueAthena
Project 1

Customer Churn Prediction

Classification model (XGBoost, Random Forest) predicting customer churn on telecom data. Focused on high recall to enable targeted retention strategies before customers leave.

Technical Approach

Developed classification models (XGBoost, Random Forest, Logistic Regression) to predict customer churn on telecom subscription data. Engineered behavioral features from usage patterns and applied threshold tuning to maximize recall for at-risk customer identification.

Key Results

  • XGBoost achieved 85% recall, enabling proactive retention of 4 out of 5 churning customers
  • Contract type, tenure, and monthly charges were the top churn predictors
  • Delivered actionable retention segments for marketing team

Tech Stack

PythonXGBoostscikit-learnSeabornPandas
Project 2

Heart Stroke Prediction

Binary classification model predicting stroke risk from patient health indicators. Applied logistic regression, decision trees, and feature engineering on clinical data.

Technical Approach

Built binary classification models to predict stroke risk from patient health indicators. Applied logistic regression, decision trees, and ensemble methods with careful handling of class imbalance in clinical data.

Key Results

  • Random Forest achieved 94% recall on stroke cases
  • Age, hypertension, and glucose levels identified as top risk factors
  • Built SHAP-based explainability dashboard for clinical use

Tech Stack

Pythonscikit-learnSHAPPandasMatplotlib
Project 3

Sentiment Analyzer

NLP pipeline classifying sentiment from text reviews using BERT and traditional ML models. Deployed as a web application with real-time prediction capabilities.

Technical Approach

Built an NLP pipeline combining BERT-based transformer models with traditional ML (Naive Bayes, SVM) for sentiment classification. Fine-tuned DistilBERT on domain-specific text data and deployed as a Flask web application with real-time prediction API.

Key Results

  • BERT model achieved 92% accuracy vs 84% for baseline Naive Bayes
  • Sub-200ms inference time with model distillation
  • REST API handles 50+ concurrent requests

Tech Stack

BERTPyTorchFlaskHugging FaceNLTK
Project 4

Sleep Disorder Prediction

ML model predicting sleep disorders (insomnia, sleep apnea) from lifestyle and health metrics. Feature engineering on BMI, stress levels, physical activity, and sleep duration.

Technical Approach

Built multi-class classification models to predict sleep disorders (insomnia, sleep apnea, none) from lifestyle and health metrics. Engineered features from BMI, stress levels, physical activity, heart rate, and sleep duration data.

Key Results

  • Achieved 89% F1-score on multi-class sleep disorder classification
  • Stress level and daily step count emerged as the strongest predictors
  • Built feature importance visualizations for clinical interpretability

Tech Stack

Pythonscikit-learnXGBoostSeabornPandas
Project 5

Fraud Detection using R

IEEE-CIS fraud detection on 590K+ transactions using ensemble methods in R. Applied SMOTE to handle class imbalance and achieved strong AUC on the Kaggle benchmark.

Technical Approach

Applied ensemble methods (Random Forest, XGBoost, LightGBM) in R on 590K+ transactions from the IEEE-CIS Kaggle competition. Handled severe class imbalance (<1% fraud rate) using SMOTE and stratified sampling.

Key Results

  • Top 15% on IEEE-CIS Kaggle leaderboard
  • Achieved 0.91 AUC-ROC with stacked ensemble
  • Identified transaction amount and card verification as top fraud indicators

Tech Stack

RXGBoostSMOTEcarettidyverse
Project 6

Big Data Analysis using Databricks

Large-scale data analysis pipeline on Databricks using Apache Spark and Delta Lake. Processed millions of records to extract business insights using distributed computing.

Technical Approach

Designed and ran distributed data processing pipelines on Databricks using Apache Spark. Processed millions of records with Delta Lake for ACID-compliant data transformations and aggregations.

Key Results

  • Reduced query time by 60% using Delta Lake caching and Z-ordering
  • Built automated ETL pipeline processing 5M+ records in under 10 minutes
  • Created interactive dashboards directly in Databricks notebooks

Tech Stack

Apache SparkDelta LakeDatabricksSQLPython
Project 7

Medical Cost Prediction

Regression model predicting individual medical insurance costs from patient demographics. Explored feature interactions (BMI, age, smoking status) using Python and scikit-learn.

Technical Approach

Built regression models (Linear, Ridge, Random Forest, XGBoost) to predict individual medical insurance costs. Performed extensive feature engineering on BMI, age, smoking status, and region interactions.

Key Results

  • Identified smoking-BMI interaction as the strongest cost predictor
  • Achieved R² of 0.86 with XGBoost regression
  • Deployed as interactive Streamlit dashboard for what-if cost simulation

Tech Stack

Pythonscikit-learnXGBoostStreamlitPandas

</Reviews>

Sugumaran is a highly skilled engineer who has a deep understanding of the fundamental concepts and algorithms of Machine Learning, and their ability to implement these techniques in practical applications is remarkable. With their AWS Cloud Certified Practitioner certification, Sugumaran demonstrated a strong command of AWS services and tools, which they have utilized to design, build and deploy ML models.

MD

Mani Deva

Senior Software Engineer, Ivanti

I am happy to recommend Sugumaran for his exceptional skills in the field of Data Science and Business. Having worked closely with Sugumaran, I can confidently say that he is one of the top students I have had the pleasure of working with. He consistently demonstrated excellent technical skills in Data Science, and his ability to bridge the gap between technical and business aspects is highly valuable.

UP

Ulises Armando Ponce Sesma

AVP Special Credit Unit, Credit Risk | MSc Data Science & AI Strategy

I had the pleasure of working with Sugumaran on a project where he demonstrated his exceptional skills in data analysis and project management. He has a keen eye for detail and is skilled at identifying trends and patterns in complex datasets. His ability to communicate complex technical concepts in a clear and concise manner is a testament to his professionalism and dedication.

AK

ASHWATH KARTHIK

QA/QC Engineer | UPDA/MMUP Certified (Mechanical)

Sugumaran is one of the most technically rigorous ML engineers I have encountered. His ability to bridge research and production — from BERT fine-tuning to AWS SageMaker deployment — delivers real impact. A rare combination of deep technical skill and genuine passion for advancing the field.

Senior Data Science Leader, HealthTech

As an educator at a top business school, Sugumaran brought real-world AI/ML experience into the classroom. Students consistently rated his sessions among the most practical and engaging. His ability to explain complex ML concepts in clear, actionable terms is outstanding.

Academic Director, Business School

Worked alongside Sugumaran on a supply chain optimization project. His end-to-end ML pipeline design — from data engineering on AWS to model deployment — reduced forecasting errors by 21%. A rare blend of deep technical skill and sharp business acumen.

VP of Engineering, Supply Chain Technology

</Writing>

</Contact>