Icon Legend

This session is not in your schedule.

This session is in your schedule. Click again to remove it.

Presentation Icons

Additional registration fee required

Faculty have requested this content not be shared outside of the session

CME Credit Offered

Abstract Award

Recording available 2/16-5/2

AIUM Credit

Foundation Awardee

Poster Icons

Abstract Award

Foundation Awardee

Poster Session 3

Category: Intrapartum Fetal Assessment

Poster Session 3

(738) Fetal Scalp pH Prediction: Human Expertise Still Beats Artificial Intelligence

Thursday, February 12, 2026

10:30 AM - 12:00 PM

Coauthor(s)

JV

Juliette Vitrou

Maternité Port-Royal, groupe hospitalier Paris Centre, AP-HP, Paris, France; Institut interdisciplinaire santé des femmes, iWISH, Université Paris cité, Paris, France;
Paris, Ile-de-France, France
Charles Garabedian, MD, PhD (he/him/his)

CHU Lille
Lille, Nord-Pas-de-Calais, France

Submitting Author and Presenting Author(s)

Aude Girault, MD, PhD (she/her/hers)

Department of Obstetrics and Gynecology, Port-Royal Maternity Hospital, AP-HP, Cochin Hospital, FHU PREMA, Paris, France
Paris, Ile-de-France, France

Coauthor(s)

MH

Mathieu Hivert, MD

CHU Lille, Department of Obstetrics, Lille, France; Univ Lille, ULR 2694-METRICS, Lille, France.
Lille, Nord-Pas-de-Calais, France

Objective:

To compare the accuracy of fetal scalp pH prediction by midwives, residents, and a large language model (ChatGPT) against actual measurements.

Study Design:
Prospective monocentric study including term laboring women undergoing fetal scalp blood sampling for FHR II tracings. For each case, three pH predictions were independently obtained from a resident, a midwife, and ChatGPT, based on standardized clinical data and cardiotocographic tracings. Correlation with actual pH was assessed with Spearman’s ρ, and accuracy using mean absolute error (MAE) and correct classification within predefined categories (< 7.20; 7.20–7.24; >7.24). Based on prediction performance, we estimated the proportion of avoidable pH tests and potential clinical consequences, including avoidable cesarean deliveries and missed severe acidosis.

Results:
A total of 95 fetal scalp pH measurements were analyzed. Correlation with actual pH values was weak for all predictors, with the highest for midwives (ρ = 0.26, p = 0.011). MAE was lowest for midwives (0.042, 95% CI 0.036–0.050) and residents (0.047, 95% CI 0.038–0.056), compared with ChatGPT (0.098, 95% CI 0.087–0.110). Correct categorical prediction rates were 61.0% for midwives, 59.7% for residents, and 24.7% for ChatGPT. ChatGPT systematically underestimated fetal pH (71.4% of cases), whereas midwives and residents showed more balanced under- and overestimation. Compared to predictions, fetal scalp pH testing avoided between 1.3% (ChatGPT) and 5.2% (midwives and residents) of neonatal acidosis cases; and prevented unnecessary cesarean deliveries in 19.5% of cases when guided by midwife or resident predictions, but up to 62.3% when compared to ChatGPT-based decisions.

Conclusion:
Midwives and residents demonstrated comparable accuracy in predicting fetal scalp pH, both markedly outperforming ChatGPT. While professional clinical judgment can potentially reduce unnecessary fetal blood sampling and cesarean deliveries, reliance on large language models in their current state would increase misclassification risk and unnecessary interventions.