
ChatGPT-5 overestimates macular hole surgery success, study finds
Key Takeaways
- Tel Aviv investigators evaluated vitrectomy outcomes (2021–2024) using anonymized demographics, ocular history, symptom duration, preop BCVA, hole metrics, surgical details, and one foveal OCT B-scan.
- Anatomical prediction accuracy favored ChatGPT-5 (90%) versus specialists (72%–86%), but it predicted closure in 49/50 eyes, failing to identify any of the six nonclosures.
Study finds ChatGPT-5 overpredicts success after macular hole surgery—beating specialists on paper but missing failures.
Artificial intelligence may be inching closer to the retina clinic, but a new study suggests it is not yet ready to tell patients how their macular hole surgery will turn out. In a retrospective analysis published in Retina, ChatGPT-5 posted headline accuracy numbers that beat 2 senior retina specialists at predicting outcomes after full-thickness macular hole (FTMH) repair—only for the investigators to conclude that the model's edge was driven almost entirely by a tendency to assume the best.
Investigators led by Aya Wattad, MD, and colleagues at Tel Aviv University evaluated 50 eyes of 50 patients who underwent pars plana vitrectomy for FTMH between 2021 and 2024. For each case, the team stripped identifiers and fed ChatGPT-5 a standardized clinical summary—age, sex, refractive and lens status, ocular history, symptom duration, preoperative best-corrected visual acuity (BCVA), hole dimensions, surgical details, and a single foveal B-scan OCT image. The model was asked to predict whether vision would improve, stay stable, or worsen at 12 months; to estimate final BCVA; and to predict anatomical closure. Two senior retina specialists reviewed the same anonymized material and made parallel predictions, and all forecasts were checked against real-world results.
What actually happened
The surgeries themselves performed as expected for idiopathic FTMH, where vitrectomy with internal limiting membrane peeling routinely yields closure rates in the 90% to 100% range. At 12 months, anatomical closure occurred in 44 of 50 eyes (88%), and mean BCVA improved significantly from 20/100 (0.7 ± 0.4 logMAR) to 20/63 (0.5 ± 0.5 logMAR; P = 0.03). Functionally, 35 eyes (70%) gained at least 2 ETDRS lines, 8 (16%) remained stable, and 7 (14%) worsened.
Where the model fell short
On anatomical outcomes, ChatGPT-5 reached 90% accuracy, ahead of the specialists' 72% to 86%. But the model arrived there by predicting closure in 49 of 50 eyes. It was flawless in the eyes that closed and blind to the eyes that did not—precisely the high-stakes scenario in which a prognosis carries weight. The specialists, by contrast, retained the clinical judgment to anticipate at least some failures.
The functional picture was less flattering. Overall BCVA prediction accuracy was 66% for ChatGPT-5 versus 42% to 44% for the specialists, but that figure masked an uneven performance: the model did reasonably well when vision improved (60%), poorly when it remained stable (≤13%), and failed entirely when it worsened (0%). The mean BCVA prediction error was 11.4 ± 10.8 letters, with roughly 60% of estimates landing within 2 lines of the true outcome.
The authors framed the takeaway in cautious terms, writing that the model's outputs "require cautious interpretation to avoid misleading confidence," and argued that AI-generated prognoses should at present serve only as supportive information. Larger prospective studies, they noted, are needed before any clinical deployment.
A familiar caution
The findings echo a recurring theme in AI evaluations across ophthalmology. Earlier reporting in Ophthalmology Times described ChatGPT-4.0 showing only suboptimal agreement with experts and a roughly 12% hallucination rate in cataract, cornea, and refractive scenarios—evidence that the technology can stumble in complex, open-ended clinical reasoning even as it handles simpler closed-set tasks. Retina has proven a particular weak spot: in a separate analysis of board-style preparation, ChatGPT answered every retina and vitreous question incorrectly.
That context matters because macular hole prognostication is itself a hard problem that has drawn sustained quantitative effort. Researchers have built OCT-based indices and machine-learning models to forecast both closure and postoperative acuity, leaning on parameters such as base diameter and preoperative vision. The Wattad analysis suggests a general-purpose language model can approximate those outputs—but without the calibrated skepticism that keeps a specialist from assuming every hole will close. For now, the message for clinicians is to keep evidence-based counseling and specialist input at the center of the conversation when discussing expected recovery after FTMH repair.
References
Wattad A, Saadi R, Bez M, Loewenstein A, Goldstein M. Predicting postoperative outcomes in full-thickness macular hole repair surgery: ChatGPT versus clinical decision. Retina. 2026;46(6):1015-1023. doi:10.1097/IAE.0000000000004779
Hillier R. Generative AI: it's only just begun. Ophthalmology Times. March 5, 2026. https://www.ophthalmologytimes.com/view/ai-it-s-only-just-begun
Study: AI cannot assist in preparing for ophthalmology board certification exams. Ophthalmology Times. February 26, 2026. https://www.ophthalmologytimes.com/view/study-ai-cannot-assist-in-preparing-for-ophthalmology-board-certification-exams
Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141(6):589-597. doi:10.1001/jamaophthalmol.2023.1144
Wong RLM, Ho M, Wong SHL, et al. A review of surgical outcomes and advances for macular holes. J Ophthalmol. 2018;2018:7389412. doi:10.1155/2018/7389412
Bajdik B, Vajas A, Kemenes G, et al. Prediction of long-term visual outcome of idiopathic full-thickness macular hole surgery using OCT parameters that estimate potential preoperative photoreceptor damage. Graefes Arch Clin Exp Ophthalmol. 2024;262(10):3107-3117. doi:10.1007/s00417-024-06500-2
Hu Y, Meng Y, Liang Y, et al. Machine learning and OCT-derived radiomics analysis to predict the postoperative anatomical outcome of full-thickness macular hole. Bioengineering (Basel). 2024;11(9):949. doi:10.3390/bioengineering11090949


























