PUBLICATIONS December 18, 2024 ~1 min read

Comparing Machine Learning Models for Neonatal Mortality Prediction: Insights from a Modeling Competition

A multi-team modeling competition showed that transparent logistic regression rivalled more complex ML for neonatal ICU mortality prediction when evaluated on identical data.

Read Full Publication

Brynne A. Sullivan; Alvaro G. Moreira; Ryan M. McAdams; Lindsey A. Knake; Ameena Husain; Jiaxing Qiu; Avinash Mudireddy; Abrar Majeedi; Wissam Shalish; Douglas E. Lake; Zachary A. Vesoulis

Key Findings

Simplicity can win. Logistic regression achieved the highest external test AUC of 0.818, outperforming more complex models.
Shared critical features. Gestational age, birth weight, Apgar scores, and heart rate variability remained influential across architectures.
Overfitting risks. Complex models required careful regularisation and cross-validation to avoid performance collapse.

Introduction

Predicting neonatal ICU mortality is critical yet noisy—clinical indicators vary across institutions and subtle physiologic changes may be missed. The competition investigated whether sophisticated ML architectures outperform simpler models under a controlled setup.

Methods

Five teams trained logistic regression, CatBoost, neural networks, random forest, and XGBoost models on >6,000 NICU admissions to predict mortality at admission and at seven days. Performance was compared using area under the ROC curve alongside calibration and interpretability checks.

Results

Logistic regression achieved the top test AUC of 0.818, edging out tree-based and neural models. Feature importance converged on gestational age, birth weight, Apgar scores, and heart-rate variability regardless of architecture, highlighting shared signal.

Discussion

The findings reinforce that model discipline, validation, and domain-informed feature design can trump algorithmic complexity in low-signal, high-stakes settings such as the NICU.

Clinical Implications

Because neonatology teams must understand and trust predictions, transparent models with calibration and explanatory plots remain competitive options for mortality risk support.

Conclusion

Model selection should prioritise interpretability, stability, and clinical partnership rather than algorithm novelty alone.

Future Directions

Validate across multicentre NICU cohorts, blend time-series physiologic data, and build visual explanation dashboards for bedside teams.

About the Authors

Brynne A. Sullivan; Alvaro G. Moreira; Ryan M. McAdams; Lindsey A. Knake; Ameena Husain; Jiaxing Qiu; Avinash Mudireddy; Abrar Majeedi; Wissam Shalish; Douglas E. Lake; Zachary A. Vesoulis