Prediction of birthweight with early and mid-pregnancy antenatal markers utilising machine learning and explainable artificial intelligence – Nature

Executive Summary: Predicting Neonatal Birthweight to Advance Sustainable Development Goals
Low birthweight (LBW) remains a formidable challenge to achieving Sustainable Development Goal 3 (SDG 3), specifically Target 3.2, which aims to end preventable neonatal deaths. This report details a study to develop a practical, explainable machine learning (ML) model to predict birthweight using routine antenatal data, thereby enabling timely clinical interventions. Real-world clinical data from 237 singleton pregnancies, comprising 19 maternal and fetal features, were prospectively collected. A stacked ensemble ML model was developed, integrating multiple algorithms and enhanced with three Explainable Artificial Intelligence (XAI) methodologies: SHAP, LIME, and Anchor. The AdaBoost model demonstrated the highest performance with 77% accuracy, while the stacked model achieved 75% accuracy, indicating strong potential for clinical application. The model identified key predictors for birthweight, including maternal height, nuchal translucency thickness, parity, and glycated hemoglobin. By providing an interpretable and accessible predictive tool, this model can assist medical professionals in making precise, early decisions, contributing directly to the reduction of neonatal mortality and advancing global health equity as outlined in the SDGs.
Introduction: Aligning with Sustainable Development Goal 3
The global health agenda, as defined by the United Nations Sustainable Development Goals (SDGs), places critical importance on maternal and child wellness. Sustainable Development Goal 3.2 explicitly targets the elimination of preventable neonatal deaths, with a goal for all nations to reduce neonatal mortality rates by 2030. A primary obstacle to achieving this goal is the high prevalence of Low Birthweight (LBW), defined by the World Health Organization (WHO) as a birthweight under 2500g. LBW is a key indicator of neonatal health and a significant contributor to both short- and long-term disability and mortality.
The Challenge of Low Birthweight in the Context of Global Health
- Globally, approximately one in seven neonates is born with LBW, with the majority occurring in Southern Asia and Sub-Saharan Africa, highlighting a critical issue of health inequality (SDG 10).
- LBW neonates face heightened risks of complications, including respiratory distress, sepsis, neurodevelopmental delays, and chronic diseases like type 2 diabetes in adulthood.
- The causes are multifactorial, encompassing maternal health, nutrition, and socioeconomic status, underscoring the interconnectedness of various SDGs.
To address this challenge, this study leverages Artificial Intelligence (AI) to create a predictive model for birthweight. By using routinely collected antenatal data, the model aims to provide a scalable and cost-effective tool that supports early intervention, aligning with the SDG principle of ensuring healthy lives and promoting well-being for all at all ages.
Methodology for a Sustainable Health Solution
This observational prospective study was designed to develop a robust and interpretable model for birthweight prediction, utilizing data that is accessible even in low-resource settings to ensure its potential contribution to global health equity (SDG 10).
Study Design and Data Collection
The study was conducted between August 2022 and August 2024, following ethical clearance (IEC1:122/2022) and clinical trial registration (CTRI/2022/08/044770). Data was collected from 237 singleton pregnancies.
- Participant Data: Nineteen clinically significant maternal and fetal features were collected during the first and second trimesters. All parameters are part of routine clinical practice, requiring no additional testing.
- Key Features Included:
- Maternal anthropometrics (age, height, weight, BMI).
- Clinical history (parity, conception type).
- Biochemical markers (Hb, HbA1C, TSH, PAPP-A).
- First-trimester ultrasound parameters (CRL, NT).
- Mid-trimester fetal biometry (BPD, HC, AC, FL).
- Risk factors (GDM, HTN).
- Target Variable: Birthweight was classified into three clinically relevant categories: Normal Birthweight (NBW), Low Birthweight (LBW), and Very Low Birthweight (VLBW).
Statistical and Machine Learning Analysis
To build a transparent and reliable predictive tool, a multi-faceted analytical approach was employed, integrating traditional statistics with advanced AI.
- Data Preprocessing: The dataset was preprocessed through standardization, encoding of categorical variables, and balancing of classes using the Synthetic Minority Over-sampling Technique (SMOTE) to address the disparity in the number of LBW and VLBW cases.
- Machine Learning Models: Multiple ML classifiers were employed, including Random Forest, Logistic Regression, Decision Tree, KNN, CatBoost, LightGBM, AdaBoost, and XGBoost. These were combined into a stacked ensemble model to improve predictive accuracy.
- Explainable AI (XAI): To overcome the “black box” nature of AI and foster clinical trust, three XAI techniques were used to ensure model transparency and interpretability:
- SHAP (Shapley Additive Explanations): To quantify the contribution of each feature to the prediction.
- LIME (Local Interpretable Model-agnostic Explanations): To explain individual predictions.
- Anchor: To identify a set of rules that “anchor” a prediction.
Results: Predictive Modeling for Improved Neonatal Outcomes
The analysis yielded a functional predictive model with clear indicators of its potential utility in clinical settings to support the aims of SDG 3.
Model Performance
The performance of the ML models was evaluated using several metrics. The results demonstrate a strong capacity for accurate prediction.
- The AdaBoost model achieved the highest performance, with an accuracy of 77%, precision of 73%, and recall of 77%.
- The stacked ensemble model demonstrated a robust accuracy of 75%, confirming its viability for clinical application.
- The confusion matrix for the stacked model showed good prediction accuracy for NBW and LBW, though performance for VLBW was lower due to limited data, highlighting an area for future improvement.
Key Predictors of Birthweight Identified by XAI
The use of XAI techniques provided transparent insights into the factors driving the model’s predictions, enhancing its trustworthiness for clinical use. Across the different statistical and XAI methods, several features were consistently identified as significant predictors of birthweight.
- Maternal Height: Consistently ranked as a top predictor, with shorter maternal height associated with a higher incidence of LBW.
- Parity: Identified by all XAI models as a key factor, with nulligravida mothers at higher risk.
- Nuchal Translucency (NT) Thickness: A significant early indicator of potential variations in birthweight.
- Crown-Rump Length (CRL): An important first-trimester measurement for predicting birth outcomes.
- Glycated Hemoglobin (HbA1c): A critical marker, particularly in relation to maternal glucose levels.
- Hypertensive Disorders of Pregnancy (HTN): Recognized as a significant risk factor for decreased uteroplacental perfusion and subsequent LBW.
- Pregnancy-Associated Plasma Protein A (PAPP-A): An influential biochemical marker from the first trimester.
Discussion: Implications for SDG 3 and Future Directions
The findings of this study have direct implications for public health strategies aimed at achieving SDG 3. By creating an accessible and interpretable tool for early risk identification, this model supports a shift from reactive to proactive maternal and neonatal care.
Clinical Significance and Contribution to Health Equity
The identification of key predictors using routine clinical data is a significant step toward democratizing advanced diagnostics. Factors like maternal height, parity, and early ultrasound markers are accessible even in low-resource settings, making the model’s insights broadly applicable. This aligns with SDG 10 by providing a tool that can help reduce health disparities between different socioeconomic settings.
The Role of AI in Proactive Healthcare and SDG Attainment
This research supports the “inverted pyramid” theory of obstetric care, which prioritizes early risk assessment to improve outcomes. By enabling clinicians to identify high-risk pregnancies in the first and second trimesters, the model facilitates timely interventions that can mitigate the progression to LBW.
- Early Intervention: Pregnant women with risk factors such as hypertension or abnormal glucose levels can receive earlier treatment, dietary counseling, and closer monitoring.
- Resource Allocation: Early identification of high-risk cases allows for more efficient allocation of specialized healthcare resources, which is crucial for sustainable health systems.
- Building Trust in AI: The integration of XAI is critical for clinical adoption. By making the model’s reasoning transparent, clinicians can confidently integrate its predictions into their decision-making processes, fostering a human-AI collaboration that enhances patient care.
While the model’s accuracy of 75-77% is promising, it is important to note that this reflects a population where high-risk women received treatment, which may have prevented some adverse outcomes. This highlights the model’s value not just as a predictive tool but as a catalyst for effective preventative care, directly contributing to the goals of SDG 3.
Conclusion: Advancing Fetal Precision Medicine for Global Health Goals
This study successfully developed and validated a machine learning model capable of predicting neonatal birthweight using routine early and mid-pregnancy clinical markers. By integrating multiple ML algorithms with XAI techniques, the model provides accurate, transparent, and clinically relevant predictions. This tool has the potential to empower medical practitioners to make timely and informed decisions, facilitating early interventions that can reduce the incidence of LBW and VLBW. Such advancements in fetal precision medicine are crucial for making substantive progress toward achieving Sustainable Development Goal 3.2 and ensuring a healthy start for every newborn, regardless of their geographic or socioeconomic context.
Analysis of Sustainable Development Goals (SDGs) in the Article
1. Which SDGs are addressed or connected to the issues highlighted in the article?
The article primarily addresses issues related to SDG 3: Good Health and Well-being. The entire study is centered on improving maternal and neonatal health outcomes, which is the core objective of this goal.
- SDG 3: Good Health and Well-being: The article’s main focus is on predicting and preventing Low Birthweight (LBW), a major cause of neonatal mortality and morbidity. The introduction explicitly states, “Sustainable Development Goal 3.2 (SDG) aims to eliminate preventable neonatal deaths and requires all countries to reduce their neonatal mortality rate by 2030.” The research aims to develop a tool to assist medical professionals in making timely decisions to improve maternal-fetal wellness, directly contributing to the health targets of SDG 3.
2. What specific targets under those SDGs can be identified based on the article’s content?
Several specific targets under SDG 3 are relevant to the content of the article:
- Target 3.2: End preventable deaths of newborns and children under 5. The article directly references this target in the introduction. The study’s goal of predicting LBW is a direct strategy to reduce neonatal complications and mortality. The text highlights that “LBW is a significant health challenge worldwide” and that LBW newborns are at a higher risk of short-term complications like “respiratory distress, hypothermia, sepsis” and long-term issues, all of which contribute to neonatal mortality and morbidity.
- Target 3.4: Reduce premature mortality from non-communicable diseases (NCDs) through prevention and treatment. The article identifies maternal NCDs such as “diabetes, and hypertension” as significant risk factors for LBW. The predictive model uses markers like “glycated hemoglobin (HbA1c)” and “hypertensive disorders of pregnancy (HTN)” to assess risk. By enabling early identification and intervention for these conditions in pregnant women, the study supports the prevention and treatment of NCDs to improve birth outcomes. Furthermore, it notes that LBW neonates have a “45% higher risk of developing type 2 diabetes,” linking maternal health to the prevention of future NCDs in the offspring.
- Target 3.8: Achieve universal health coverage, including access to quality essential health-care services. The study emphasizes the use of “routinely collected antenatal parameters” and “standard point-of-care variables” for its model. This approach ensures that the predictive tool can be applied in various clinical settings, including low-resource ones, without requiring additional expensive testing. This promotes access to quality, data-driven essential healthcare services for a broader population, which is a cornerstone of universal health coverage.
3. Are there any indicators mentioned or implied in the article that can be used to measure progress towards the identified targets?
Yes, the article mentions and implies several indicators that are used to measure progress toward the identified SDG targets.
- Indicator 3.2.2: Neonatal mortality rate. This is explicitly mentioned in the introduction: “…requires all countries to reduce their neonatal mortality rate by 2030.” The entire premise of the study—reducing LBW—is aimed at lowering this specific rate.
- Prevalence of Low Birthweight (LBW): While not an official SDG indicator, the prevalence of LBW is a critical proxy indicator for Target 3.2. The article extensively uses and measures it, citing global statistics (“one in seven neonates is born as a low-birthweight baby”) and national data (“the prevalence of LBW in India decreased from 22% in 2005–2006 to 17.5% in 2015–2016”). The study’s outcome variable is the classification of birthweight into Normal (NBW), Low (LBW), and Very Low (VLBW), making its prevalence a central metric.
-
Clinical Markers for Maternal and Fetal Health: The study uses 19 clinical features that can serve as process indicators for monitoring maternal health (related to Target 3.4) and predicting neonatal outcomes (related to Target 3.2). These include:
- Maternal height and Body Mass Index (BMI)
- Glycated hemoglobin (HbA1c) as a marker for diabetes
- Presence of Hypertensive disorders of pregnancy (HTN)
- Nuchal translucency (NT) thickness and Crown-rump length (CRL) from ultrasound scans
- Pregnancy-associated plasma protein A (PAPP-A)
Monitoring these factors is part of the prevention and treatment strategy for NCDs during pregnancy.
4. Table of SDGs, Targets, and Indicators
SDGs | Targets | Indicators |
---|---|---|
SDG 3: Good Health and Well-being | 3.2: By 2030, end preventable deaths of newborns and children under 5 years of age. |
|
SDG 3: Good Health and Well-being | 3.4: By 2030, reduce by one third premature mortality from non-communicable diseases (NCDs) through prevention and treatment. |
|
SDG 3: Good Health and Well-being | 3.8: Achieve universal health coverage, including access to quality essential health-care services. |
|
Source: nature.com