News|Articles|June 26, 2026

ChatGPT-5 overestimates macular hole surgery success, study finds

Author(s)Ophthalmology Times Staff Reports

Listen

0:00 / 0:00

Key Takeaways

Tel Aviv investigators evaluated vitrectomy outcomes (2021–2024) using anonymized demographics, ocular history, symptom duration, preop BCVA, hole metrics, surgical details, and one foveal OCT B-scan.
Anatomical prediction accuracy favored ChatGPT-5 (90%) versus specialists (72%–86%), but it predicted closure in 49/50 eyes, failing to identify any of the six nonclosures.
Functional forecasts were poorly calibrated: ChatGPT-5 achieved 66% categorical accuracy yet scored 0% when vision worsened and ≤13% when stable; mean error was 11.4±10.8 letters.
Clinical deployment remains premature; outputs warrant cautious interpretation, with AI limited to supportive context pending larger prospective validation and improved calibration for adverse outcomes.

Study finds ChatGPT-5 overpredicts success after macular hole surgery—beating specialists on paper but missing failures.

Artificial intelligence may be inching closer to the retina clinic, but a new study suggests it is not yet ready to tell patients how their macular hole surgery will turn out. In a retrospective analysis published in Retina, ChatGPT-5 posted headline accuracy numbers that beat 2 senior retina specialists at predicting outcomes after full-thickness macular hole (FTMH) repair—only for the investigators to conclude that the model's edge was driven almost entirely by a tendency to assume the best.

Investigators led by Aya Wattad, MD, and colleagues at Tel Aviv University evaluated 50 eyes of 50 patients who underwent pars plana vitrectomy for FTMH between 2021 and 2024. For each case, the team stripped identifiers and fed ChatGPT-5 a standardized clinical summary—age, sex, refractive and lens status, ocular history, symptom duration, preoperative best-corrected visual acuity (BCVA), hole dimensions, surgical details, and a single foveal B-scan OCT image. The model was asked to predict whether vision would improve, stay stable, or worsen at 12 months; to estimate final BCVA; and to predict anatomical closure. Two senior retina specialists reviewed the same anonymized material and made parallel predictions, and all forecasts were checked against real-world results.

What actually happened

The surgeries themselves performed as expected for idiopathic FTMH, where vitrectomy with internal limiting membrane peeling routinely yields closure rates in the 90% to 100% range. At 12 months, anatomical closure occurred in 44 of 50 eyes (88%), and mean BCVA improved significantly from 20/100 (0.7 ± 0.4 logMAR) to 20/63 (0.5 ± 0.5 logMAR; P = 0.03). Functionally, 35 eyes (70%) gained at least 2 ETDRS lines, 8 (16%) remained stable, and 7 (14%) worsened.

Where the model fell short

On anatomical outcomes, ChatGPT-5 reached 90% accuracy, ahead of the specialists' 72% to 86%. But the model arrived there by predicting closure in 49 of 50 eyes. It was flawless in the eyes that closed and blind to the eyes that did not—precisely the high-stakes scenario in which a prognosis carries weight. The specialists, by contrast, retained the clinical judgment to anticipate at least some failures.

The functional picture was less flattering. Overall BCVA prediction accuracy was 66% for ChatGPT-5 versus 42% to 44% for the specialists, but that figure masked an uneven performance: the model did reasonably well when vision improved (60%), poorly when it remained stable (≤13%), and failed entirely when it worsened (0%). The mean BCVA prediction error was 11.4 ± 10.8 letters, with roughly 60% of estimates landing within 2 lines of the true outcome.

The authors framed the takeaway in cautious terms, writing that the model's outputs "require cautious interpretation to avoid misleading confidence," and argued that AI-generated prognoses should at present serve only as supportive information. Larger prospective studies, they noted, are needed before any clinical deployment.

A familiar caution

The findings echo a recurring theme in AI evaluations across ophthalmology. Earlier reporting in Ophthalmology Times described ChatGPT-4.0 showing only suboptimal agreement with experts and a roughly 12% hallucination rate in cataract, cornea, and refractive scenarios—evidence that the technology can stumble in complex, open-ended clinical reasoning even as it handles simpler closed-set tasks. Retina has proven a particular weak spot: in a separate analysis of board-style preparation, ChatGPT answered every retina and vitreous question incorrectly.

That context matters because macular hole prognostication is itself a hard problem that has drawn sustained quantitative effort. Researchers have built OCT-based indices and machine-learning models to forecast both closure and postoperative acuity, leaning on parameters such as base diameter and preoperative vision. The Wattad analysis suggests a general-purpose language model can approximate those outputs—but without the calibrated skepticism that keeps a specialist from assuming every hole will close. For now, the message for clinicians is to keep evidence-based counseling and specialist input at the center of the conversation when discussing expected recovery after FTMH repair.

References

Wattad A, Saadi R, Bez M, Loewenstein A, Goldstein M. Predicting postoperative outcomes in full-thickness macular hole repair surgery: ChatGPT versus clinical decision. Retina. 2026;46(6):1015-1023. doi:10.1097/IAE.0000000000004779
Hillier R. Generative AI: it's only just begun. Ophthalmology Times. March 5, 2026. https://www.ophthalmologytimes.com/view/ai-it-s-only-just-begun
Study: AI cannot assist in preparing for ophthalmology board certification exams. Ophthalmology Times. February 26, 2026. https://www.ophthalmologytimes.com/view/study-ai-cannot-assist-in-preparing-for-ophthalmology-board-certification-exams
Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141(6):589-597. doi:10.1001/jamaophthalmol.2023.1144
Wong RLM, Ho M, Wong SHL, et al. A review of surgical outcomes and advances for macular holes. J Ophthalmol. 2018;2018:7389412. doi:10.1155/2018/7389412
Bajdik B, Vajas A, Kemenes G, et al. Prediction of long-term visual outcome of idiopathic full-thickness macular hole surgery using OCT parameters that estimate potential preoperative photoreceptor damage. Graefes Arch Clin Exp Ophthalmol. 2024;262(10):3107-3117. doi:10.1007/s00417-024-06500-2
Hu Y, Meng Y, Liang Y, et al. Machine learning and OCT-derived radiomics analysis to predict the postoperative anatomical outcome of full-thickness macular hole. Bioengineering (Basel). 2024;11(9):949. doi:10.3390/bioengineering11090949

Don’t miss out—get Ophthalmology Times updates on the latest clinical advancements and expert interviews, straight to your inbox.

Latest CME

In-Person + Virtual Event

Retina Updates in Montréal

July 16-17, 2026

ChatGPT-5 overestimates macular hole surgery success, study finds

Key Takeaways

What actually happened

Where the model fell short

A familiar caution

References

Wattad A, Saadi R, Bez M, Loewenstein A, Goldstein M. Predicting postoperative outcomes in full-thickness macular hole repair surgery: ChatGPT versus clinical decision. Retina. 2026;46(6):1015-1023. doi:10.1097/IAE.0000000000004779

Hillier R. Generative AI: it's only just begun. Ophthalmology Times. March 5, 2026. https://www.ophthalmologytimes.com/view/ai-it-s-only-just-begun

Study: AI cannot assist in preparing for ophthalmology board certification exams. Ophthalmology Times. February 26, 2026. https://www.ophthalmologytimes.com/view/study-ai-cannot-assist-in-preparing-for-ophthalmology-board-certification-exams

Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141(6):589-597. doi:10.1001/jamaophthalmol.2023.1144

Wong RLM, Ho M, Wong SHL, et al. A review of surgical outcomes and advances for macular holes. J Ophthalmol. 2018;2018:7389412. doi:10.1155/2018/7389412

Bajdik B, Vajas A, Kemenes G, et al. Prediction of long-term visual outcome of idiopathic full-thickness macular hole surgery using OCT parameters that estimate potential preoperative photoreceptor damage. Graefes Arch Clin Exp Ophthalmol. 2024;262(10):3107-3117. doi:10.1007/s00417-024-06500-2

Hu Y, Meng Y, Liang Y, et al. Machine learning and OCT-derived radiomics analysis to predict the postoperative anatomical outcome of full-thickness macular hole. Bioengineering (Basel). 2024;11(9):949. doi:10.3390/bioengineering11090949

Related Content

Root cause–based dry eye classification and the shift toward precision medicine

Refractive cataract surgery: aiming for '20/happy' patients

Bausch + Lomb survey ties dry eye symptom relief to quality of life gains

Eyes on June 2026: Approvals, pipeline momentum, and AI under the microscope

Understanding meibomian gland dysfunction, neuropathic pain, and the evolving dry eye patient

Latest CME

Retina Updates in Montréal

PER Global Perspectives: The TROP2-Targeted ADC Landscape in NSCLC and How to Interpret the Evidence

PER Global Perspectives: Differentiating and Managing Toxicities with TROP2-Targeted ADCs in NSCLC Through Multidisciplinary Pathways

Interventional Dry Eye: A Stepwise Treatment & Management Approach

Rapid Reviews in Retina™: Emerging Updates from Spring 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

(CME Track) Rapid Reviews in Retina™: Emerging Updates from Summer 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

(CME Track) Collaborating Across the Continuum™: Best Practices in Patient-Centric Team Management of XLRP

(CME Track) The Evolution of MacTel Management: Integrating Neuroprotective Therapies Into Clinical Practice

(CME Track) Community Collaborative Connections™: Optimizing the Collaborative Care of Neovascular Retinal Disease in a New Age of Treatment

(CME Track) A Forward Look at Anti-VEGF Therapies: A Paradigm Shift in Neovascular Retinal Disease Management

AREDS3 and Beyond: The Scientific Basis for B Vitamin Nutritional Supplementation in AMD

Navigating Advances in Neovascular Retinal Disease: Translating Evidence to Practice in AMD, DME, and RVO

(CME Track) Beyond the Collarette: Empowering Patients in the Management of Demodex Blepharitis

(COPE Track) Beyond the Collarette: Empowering Patients in the Management of Demodex Blepharitis

Navigating Ocular Toxicities: A Multidisciplinary Roadmap for Managing Adverse Events in Targeted Cancer Therapy

Rapid Reviews in Retina™: Emerging Updates from Fall 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - California

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - NYC Metro

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - Florida

When Mites Meet Their Match: Empowering Patients With Targeted Treatment for 𝘋𝘦𝘮𝘰𝘥𝘦𝘹 Blepharitis

The Presbyopia-Correcting and Toric IOL Playbook: Game-Changing Surgical Strategies to Enhance Patient Outcomes

Trending on Eye Care Network - Ophthalmology Times

Root cause–based dry eye classification and the shift toward precision medicine