
Study finds mixed real and synthetic eye images enhance ocular AI performance
Key Takeaways
- Four Roboflow/YOLOv8 instance-segmentation models differed only by training-set composition: 100 synthetic, 100 real, 100 mixed (50/50), or 200 mixed, with pixel-level labels for sclera, iris, and pupil.
- Single-domain training produced cross-domain breakdowns, including 0% pupil accuracy on a real image for synthetic-only training and 50% on a synthetic image for real-only training.
A balanced mix of AI-generated and smartphone-captured images eliminated the domain-specific failures seen with single-source training
Training a computer vision model on a 50:50 blend of synthetic and real eye images produces more reliable segmentation of the pupil, iris, and sclera than training on either image type alone, according to a cross-domain validation study published in Cureus.
The investigators, led by Krishna Keshav, a medical student at All India Institute of Medical Sciences, New Delhi, and corresponding author Deepsekhar Das, MD, assistant professor, set out to address one of the persistent bottlenecks in building ophthalmic AI: assembling large, well-annotated, ethically sourced image datasets. Generative AI offers a tempting shortcut, but the team noted that the value of synthetic eye images for training segmentation models had not been well characterized. Their question was whether mixing artificial and real images could deliver dependable performance across both domains.
How the study was designed
The researchers built 4 instance-segmentation models using the Roboflow 3.0 framework, which is based on a YOLOv8 architecture. Each model differed only in the composition of its training data: 1 trained on 100 AI-generated images, 1 on 100 real clinical images, 1 on a balanced mix of 100 images (50 synthetic, 50 real), and 1 on a balanced mix of 200 images.
Synthetic images were generated with a publicly available generative model using a single standardized prompt, producing isolated eyes across a range of races and free of periorbital pathology. Real images came from patients at a tertiary care center in northern India, captured with a smartphone under standardized lighting after institutional ethics approval and informed consent; eyes with surface disease, lid pathology, or significant anterior segment abnormality were excluded. Annotators manually delineated three structures—sclera, iris, and pupil—at the pixel level to serve as ground truth.
All 4 models were evaluated on the same held-out test set of 10 images (5 synthetic, 5 real), with detection accuracy calculated per structure and compared using repeated-measures ANOVA and paired t-tests.
Mixed training erased the cross-domain failures
The headline finding was that hybrid training removed the catastrophic, domain-specific breakdowns that single-source models produced when tested on the opposite image type. The pupil illustrated this most starkly: the AI-only model scored 0% on 1 real image, while the real-only model managed just 50% on 1 synthetic image. By contrast, the 100-image mixed model reached 88.5% (± 5.9%) pupil accuracy with no failures, tightening the standard deviation to under 6% compared with roughly 30% for the AI-only model.
Performance varied by structure. Scleral recognition was robust and stable across all four training conditions, and pupil recognition was likewise consistent, though it tended to lag the sclera. The iris was the structure most sensitive to dataset composition, with accuracy improving when real clinical images were included—a pattern the authors attributed to the iris's textural complexity and inter-individual variation, which synthetic images alone struggle to capture.
Notably, doubling the mixed dataset from 100 to 200 images conferred no measurable advantage for pupil accuracy (p = 0.95). The authors concluded that the makeup of the dataset, rather than its sheer size, was the primary driver of robustness in this setting.
Caveats and clinical takeaway
The authors flagged several limitations. The sample sizes were small, all real images came from a single center, the synthetic images were generated by 1 model with a fixed prompt, no correction was applied for multiple comparisons, and—importantly for clinical relevance—no eyes with ocular pathology were included. Performance in diseased eyes therefore remains untested.
Even so, the findings point toward hybrid synthetic-real training as a pragmatic, resource-efficient strategy for developing deployable ocular AI, particularly where large annotated clinical datasets are hard to obtain. The investigators caution that AI-generated images work well as a supplement but fall short as standalone training material for anatomically complex structures such as the iris, and they call for larger, demographically diverse datasets and validation in pathological eyes before broader clinical use.
Reference:
Keshav K, Das D, Grover S, Agrawal S, Bharti A. Mixing synthetic and real images improves artificial intelligence-based detection of the pupil, iris, and sclera: a cross-domain validation study. Cureus. 2026;18(5):e109409. doi:10.7759/cureus.109409





















