Natural language processing pairs with big data curation

Leng,Theodore;

Publication

Article

June 30, 2021

Digital Edition

Ophthalmology Times: June 15, 2021

Volume46

Issue 10

Natural language processing pairs with big data curation

Author(s):

Theodore Leng, MD, MS

Knowledge of tools used in data interpretation helps clinicians trust accuracy of findings.

Special to Ophthalmology Times®

Artificial intelligence (AI) is permeating society, directing everything from the “products you may like” portion of an e-commerce site to a GPS suggesting a faster route to your destination.

One of the fastest growing areas for AI is medicine, and ophthalmology is helping to lead the way in AI evolutions.

For example, deep-learning AI programs that interpret fundus photographs of patients with diabetes may be used to improve screening for diabetic retinopathy.¹

In some cases, AI relies on natural language processing (NLP) to gather and interpret language-based data.

In ophthalmology, NLP can process electronic health record (EHR) information from the Academy of Ophthalmology (AAO) Intelligent Research in Sight (IRIS) Registry, which houses data from 367 million patient encounters with more than 65 million unique patients.

Verana Health, the data curation and analytics partner of the academy, organizes data from the registry to prepare it for interpretation via NLP.

Natural language processing
Although the phrase “natural language processing” may be new to some readers, many individuals frequently (and perhaps unknowingly) interact with it.

A common encounter with NLP occurs when interfacing with document-scanning technology that converts text into digital data.

Optical character recognition, an early NLP method, identifies letters, words, and phrases from static documents and converts them to data points.

Imagine a scenario in which you are tasked with entering your passport information into a portal. You have 2 options to enter data such as your given name, surname, and nation of origin.

You can manually enter your information in each field or you can use your phone to snap a picture of your passport’s relevant pages and allow NLP to populate the fields.

The latter option, which extracts language from a photograph and places it in the appropriate areas of the portal, is quicker.

Machine learning applied to NLP has broader uses that suit it for analyzing text-heavy data in the IRIS Registry.

If NLP is used to interpret the hundreds of millions of EHR data points in the registry, it may produce information on real-world treatment outcomes, disease prevalence data, and treatment patterns.

CLINICAL FINDINGS IN OPHTHALMOLOGY
Two examples of how AI could be used to examine IRIS Registry data illustrate the potential of drawing insights from large databases via NLP analysis.

Grading severity
NLP could be used to search IRIS Registry data for a series of words or phrases in patient records. Searching for prespecified phrases or words may confirm the accuracy of coding data.

For the sake of illustration, consider glaucoma.

Clinicians use various qualitative (eg, types of procedures undergone, medication history, whether cataract surgery has occurred) and quantitative (eg, cup-to-disc ratios, IOP measurements, visual acuity, visual field data) data points to classify a patient’s glaucoma severity.

No single data point leads to a diagnosis of mild, moderate, or severe glaucoma, and patients with similar quantitative profiles may be classified differently based on qualitative data points.

ICD-10 provides codes for various degrees of glaucoma severity. In a perfect world, clinicians would accurately code each case during every visit.

However, due to many factors, the stage of glaucoma may not be updated in the EHR to reflect the current clinical state of the patient.

By using NLP, investigators can confirm that the coded diagnosis reflects the qualitative and quantitative measurements of a patient encounter.

Suppose investigators needed to determine the number of patients categorized as having severe glaucoma.

After defining severe glaucoma with a combination of qualitative and quantitative parameters—relying on definitions from the AAO and the American Glaucoma Society designed to reduce subjectivity in stage diagnoses— investigators could perform a customized search of IRIS Registry records using NLP to confirm that the number of severe cases coded in a given time frame match the number of severe cases as defined by details in the encounter.

An NLP data analysis makes the real-world data housed in the IRIS Registry more accurate. Automating these quality checks saves time and increases the quality of the overall body of data at investigators’ disposal.

Reconciling coding data with real-world prevalence
Understanding the prevalence of particular diseases in ophthalmology may be limited by coding behavior after a patient encounter.

For example, a patient presenting to a cataract surgeon for preoperative evaluation may also have early age-related macular degeneration (AMD) in addition to their cataract.

It is possible that this encounter will be coded as a cataract for purposes of reimbursement, and the ICD-10 code for AMD not entered.

This patient’s AMD would be undetected by investigators leveraging coding data to estimate disease prevalence.

However, an NLP-based analysis of IRIS Registry data could detect the presence of underreported or unreported disease in patient charts, thereby generating a more robust picture of real-world disease rates.

Trusting the algorithm
The more clinicians know about how NLP-based analyses make determinations, and the more transparent the models are, the more willing they may be to accept the results of AI reports.

All NLP algorithms require a degree of explainability, which allows investigators to understand to what degree an algorithm values particular pieces of data.

If NLP determines that, for example, a certain percentage of patients of a certain age have AMD, then investigators can examine the algorithm’s methods to ensure that a legitimate medical reason exists for this conclusion.

Instances may arise in which AI detects disease that is either imperceptible by human evaluation or is linked to heretofore unknown anatomic manifestations.

Machine learning–based algorithms, in which AI platforms learn to detect patterns from massive data sets, have been shown to accurately estimate the age, gender, smoking status, and systolic blood pressure of patients based on fundus photographs alone.²

How or why those algorithms make their determinations are not yet understood, but their results nonetheless show the potential of AI to change the landscape of medicine.

WHAT’S NEXT?
NLP may be one of the most important tools for extracting meaningful insights from real-world data in the IRIS Registry.

The better we understand how IRIS Registry data are curated and analyzed, the more we can embrace the results of AI data analyses.

--

Theodore Leng, MD, MS
e:vision.md@gmail.com
Leng is the director of research at the Byers Eye Institute at Stanford University in California and a medical adviser to Verana Health.

References
1. Lu L, Ren P, Lu Q, et al. Analyzing fundus images to detect diabetic retinopathy (DR) using deep learning system in the Yangtze River delta region of China. Ann Transl Med. 2021;9(3):226. doi:10.21037/atm-20-3275

2. Poplin R, Varadarajan AV, Blumer K, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2(3):158-164. doi:10.1038/s41551-018-0195-0

Download Issue PDF

Articles in this issue