Icon Legend

This session is not in your schedule.

This session is in your schedule. Click again to remove it.

Presentation Icons

Additional registration fee required

Faculty have requested this content not be shared outside of the session

CME Credit Offered

Abstract Award

Recording available 2/16-5/2

AIUM Credit

Foundation Awardee

Poster Icons

Abstract Award

Foundation Awardee

Poster Session 4

Category: Digital Health Technologies (DHT)

Poster Session 4

(1127) Assessing the Accuracy and Safety of ChatGPT Responses to Common Questions About Syphilis in Pregnancy

Thursday, February 12, 2026

3:30 PM - 5:00 PM

Submitting Author and Presenting Author(s)

Joe Haydamous, MD (he/him/his)

PGY1
Department of Obstetrics, Gynecology and Reproductive Sciences, McGovern Medical School at UTHealth Houston
Department of Obstetrics and Gynecology, McGovern Medical School at UT Health, Houston, Texas, United States

Coauthor(s)

Laura Diab, MD (she/her/hers)

Division of Maternal-Fetal Medicine, Department of Obstetrics, Gynecology and Reproductive Sciences, McGovern Medical School at UTHealth Houston
Houston, Texas, United States
AM

Analuisa C. Mosqueda, MD

Division of Maternal-Fetal Medicine, Department of Obstetrics, Gynecology and Reproductive Sciences, McGovern Medical School at UTHealth Houston
Houston, Texas, United States
SD

Sabrina C. DaCosta, MD

Division of Maternal-Fetal Medicine, Department of Obstetrics, Gynecology and Reproductive Sciences, McGovern Medical School at UTHealth Houston
Houston, Texas, United States
Irene A. Stafford, MD, MPH, MS

Associate Professor
Division of Maternal-Fetal Medicine, Department of Obstetrics, Gynecology and Reproductive Sciences, McGovern Medical School at UTHealth Houston
Houston, Texas, United States

Objective:
This study evaluated the accuracy, completeness, and safety of ChatGPT-generated responses to common questions about syphilis in pregnancy, aiming to assess its role as an educational and triage-support tool in obstetric care.

Study Design:

Clinically relevant questions were compiled using CDC, ACOG, and WHO guidelines, as well as topics frequently discussed in patient forums. Responses were generated using GPT-4 and evaluated by six independent experts in obstetrics, maternal-fetal medicine, and infectious disease. Reviewers rated each response on a 5-point Likert scale across three domains: accuracy, completeness, and safety. Each question was submitted using one of two standardized prompts simulating either patient or provider communication styles. The complete list of questions and prompts appears in Figure 1. Questions were grouped into four categories: General Knowledge, Public Health/Prevention, Treatment, and Diagnostic Interpretation.

Results:

ChatGPT responses showed high performance across all domains, with mean scores of 4.38 for accuracy, 4.49 for completeness, and 4.47 for safety. Between 88% and 92% of evaluations received scores of 4 or higher. Public Health/Prevention questions achieved the highest overall scores, averaging 4.67 or above in each domain. General Knowledge items also performed well, especially in safety (4.79) and completeness (4.75). Treatment-related responses maintained strong ratings across all categories (≥4.33), showing good guideline alignment. Although Diagnostic Interpretation responses were accurate (4.33) and safe (4.50), completeness was lower (4.17), with reviewers noting missing nuance in complex cases. Importantly, no unsafe or misleading content was identified by any reviewer.

Conclusion:

ChatGPT produced safe, accurate, and generally complete responses to syphilis-in-pregnancy questions. Its strong performance in public health and general education supports cautious integration into prenatal counseling workflows. Enhancing diagnostic depth should be prioritized while maintaining the observed high safety profile.