News
Article
Author(s):
OpenAI's ChatGPT-4o enhances ophthalmological image generation, producing realistic retinal photographs while highlighting the need for further research in training datasets.
(Image Credit: AdobeStock/Rizq)
In early 2025, OpenAI announced the launch of ChatGPT-4o Image Generation, a text-to-image generator integrated into the large language model (LLM) GPT-4o. OpenAI describes the image generation as the “most advanced image generator yet [incorporated] into GPT‑4o. The result [is] image generation that is not only beautiful but [also] useful.”1
In the past, LLMs have struggled to interpret and produce ophthalmological images.2 A recent study by Andrea Taloni, MD, and Massimo Busin, MD, from the department of translational medicine at the University of Ferrara in Ferrara, Italy, and colleagues sought to determine whether the new model could allow the generation of realistic ophthalmological images.
Authors noted that the ChatGPT memory feature was disabled prior to attempts to “avoid potential influence from previous conversations.” The authors prompted ChatGPT to “generate a realistic image of a healthy retinal fundus photograph of the posterior pole.” In turn, ChatGPT generated a seemingly realistic image of a retina (Figure 1), but upon further investigation, the authors found “hints of fabrication,” citing that the “retinal background was excessively homogeneous, lacking any sign of choroidal vascular patterns.”
In an attempt to enhance the LLM’s ability to create a realistic photo, authors uploaded a real fundus image to GPT-4o, along with a prompt to “generate a fundus photograph as similar as possible to this one.” The authentic fundus image was captured from a healthy 49-year-old woman using the Digital Fundus Camera Canon CR-2 (authors noted the consent of the patient to use the photo for the investigation).
According to the authors, the new image (Figure 2) was more realistic than the first one generated by ChatGPT, citing that “choroidal vasculature was present, and retinal vessels, although still exhibiting
a pronounced axial light reflex, appeared compatible with normal retinal anatomy.”
Authors noted that since LLMs require extensive datasets of images to properly detect, classify, and grade retinal diseases, researchers have developed generative adversarial networks (GANs) that can synthesize high-resolution images aimed at augmenting real image datasets. A study by Burlina et al proposed4 several criteria for synthetic fundus images to be suitable for inclusion in training data sets, such as realism, the inability to distinguish real from generated images, and variability.
Authors concluded that while LLM-based image generation may offer a faster, cheaper alternative, developing GANs requires technical expertise and substantial computational resources. While they noted that this publicly accessible LLM can generate high-resolution, authentic-looking retinal photographs, further research is needed to determine whether such images can be used in training datasets.
Don’t miss out—get Ophthalmology Times updates on the latest clinical advancements and expert interviews, straight to your inbox.