ARVO 2025: Cross-attention network for glaucoma classification

Author(s):

Roshan Kenia presented a poster on how AI-CNet3D enhances glaucoma classification using cross-attention networks while improving interpretability and performance in OCT scan analysis.

At the ARVO 2025 meeting in Salt Lake City, Utah, Roshan Kenia presented a poster on AI-CNet3D, a cross-attention network for glaucoma classification.

Video Transcript:

Editor's note: The below transcript has been lightly edited for clarity

Roshan Kenia:

So I'm presenting AI-CNet3D. It's a cross-attention network for glaucoma classification. The basic idea is, when we get our OCT scans, they're condensed into the reports that ophthalmologists typically use through their glaucoma diagnosis. But there's a lot of rich structural information in the 3D data that isn't typically used and it's very time consuming to look at. So we developed a network, that's a convolutional [neural] network embedded with attention. And the basic idea is glaucoma can be a symmetric disease where it affects both the superior and inferior nerve, or it could affect one or the other. What we do is we split the volume in half based on the superior and inferior nerve, and then we use cross attention to attend one of those halves to the other halves, and same vice versa. And as a result, we embed that into a convolutional network. And what we see is we get very good performance compared to other other types of networks.

And then the next step was we wanted our model to be interpretable, so we developed this thing called CARE, which is a channel attention representation. And what that allows us to do is visualize what the attention layers are using to make their diagnosis. We can also use Grad CAM to use to visualize what the convolutional layers are using to make their diagnosis. And we can see here it's very spread out and it's not very informative. What we can do is enforce a consistency between the two so that the network is more interpretable, and then the output of the heat map itself, too is very interpretable. So we train our model further to enforce consistency between the two of these and then, as a result, what we see is the performance increases even more. And what we think is happening is that, in the machine learning world, when we do this, we're regularizing the data or regularizing the model, and so then when we evaluate it on more test data, it performs better because it's just more regularized.

Don’t miss out—get Ophthalmology Times updates on the latest clinical advancements and expert interviews, straight to your inbox.

Subscribe Now!