Comparing Machine Learning Models for Neonatal Mortality Prediction: Insights from a Modeling Competition
A multi-team modeling competition showed that transparent logistic regression rivalled more complex ML for neonatal ICU mortality prediction when evaluated on identical data.
Key Findings
- Simplicity can win. Logistic regression achieved the highest external test AUC of 0.818, outperforming more complex models.
- Shared critical features. Gestational age, birth weight, Apgar scores, and heart rate variability remained influential across architectures.
- Overfitting risks. Complex models required careful regularisation and cross-validation to avoid performance collapse.
Introduction
Predicting neonatal ICU mortality is critical yet noisy—clinical indicators vary across institutions and subtle physiologic changes may be missed. The competition investigated whether sophisticated ML architectures outperform simpler models under a controlled setup.
Methods
Five teams trained logistic regression, CatBoost, neural networks, random forest, and XGBoost models on >6,000 NICU admissions to predict mortality at admission and at seven days. Performance was compared using area under the ROC curve alongside calibration and interpretability checks.
Results
Logistic regression achieved the top test AUC of 0.818, edging out tree-based and neural models. Feature importance converged on gestational age, birth weight, Apgar scores, and heart-rate variability regardless of architecture, highlighting shared signal.
Discussion
The findings reinforce that model discipline, validation, and domain-informed feature design can trump algorithmic complexity in low-signal, high-stakes settings such as the NICU.
Clinical Implications
Because neonatology teams must understand and trust predictions, transparent models with calibration and explanatory plots remain competitive options for mortality risk support.
Conclusion
Model selection should prioritise interpretability, stability, and clinical partnership rather than algorithm novelty alone.
Future Directions
Validate across multicentre NICU cohorts, blend time-series physiologic data, and build visual explanation dashboards for bedside teams.