Clinical Natural Language Processing (NLP)

Decoding the Genetic Secrets of Lung Cancer with the help of technology

Lung cancer claims millions of lives worldwide every year. And there’s no single cure. But what if we could gain deeper insights into the genetic makeup of a population of 30,000+ lung cancer patients? Thanks to our Natural Language Processing (NLP) algorithm and a team of dedicated researchers, we now have a clearer picture of the non-small cell lung cancer (NSCLC) patient population and the specific genetic mutations that drive tumor growth. In this blog, we’ll share our AI-driven findings from the Dutch nationwide registry of histo- and cyto-pathology (PALGA) registry. Read on to discover how AI data processing with 95.9% accuracy can propel the progress of cancer treatment. ‍

Introduction: Personalized therapies in lung cancer and the need for better data insights

NSCLC is the most common form of lung cancer and occurs in a staggering 85% of cases. Just like any other cancer, it is a result of genetic mutations in patient’s cells. However, every case is different. That is why personalized therapies are currently breaking through as the most effective treatments. However, to successfully apply a personalized cancer therapy, it is crucial to first identify the cancer’s genetic origin.

NSCLC is known to be driven by mutations in the epidermal growth factor receptor (EGFR) gene. Targeted therapies like tyrosine kinase inhibitors (TKi) have shown promising results in fighting tumors with EGFR mutations. However, patient scan develop resistance to these drugs, leading to disease progression and lower survival rates.

To gain a better understanding of the NSCLC patient population and disease-specific characteristics, we need to explore real-world evidence (RWE). However, even with a wealth of patient data available, such as the Dutch nationwide registry of histo- and cyto-pathology (PALGA), it can be challenging to extract meaningful insights from the unstructured narrative reports.

Goals: Revealing crucial NSCLC characteristics with data mining

With this project, we aimed to gain population genetics insights into NSCLC. By identifying the most prevalent mutations leading to TKi resistance, researchers can pave the way for more personalized treatments that target the specific molecular mechanisms driving each patient's disease.

But it's not just about the end goal. We deeply value the possibility to improve our understanding of the NSCLC patient population and disease-specific characteristics. This can help us identify gaps in our current knowledge and inform the development of new research questions.

We also aimed to demonstrate the value of RWE in cancer research. By showing how insights from large-scale patient registries can be used to guide treatment decisions, we can help to raise awareness of the importance of RWE in advancing cancer care.

Methods: The roadmap of our AI-powered PALGA data structurization and analysis

The PALGA registry is an invaluable resource that houses unstructured and semi-structured patient reports from over 90 hospitals. Our research team harnessed the power of LynxCare’s NLP technology to automatically mine a staggering 148,000+ records, representing more than 30,000 patients. This treasure trove of data was collected between 2019 and 2020, offering a rich snapshot of the NSCLC patient population during that period.

The NLP recognized 33 data points concerning mutations in NSCLC. Data was structured into an OMOP Common Data Model (CDM) data warehouse. From there, our researchers were able to filter the information on all EGFR mutations found in patients’ data and perform statistical analyses on the tested population.

To ensure the accuracy and reliability of the results, a team of highly trained(bio)medical researchers manually performed additional data validation and annotation of mutations. Importantly, only aggregated and fully anonymized insights were gathered and shared.

Results: Outstanding NLP Performance Brings Us Novel Insights on NSCLC Genetics

The LynxCare data processing technology demonstrated remarkable efficiency, processing data 48 times faster than manual PALGA data processing. This translates into a cost reduction of 16 times, highlighting the transformative potential of AI and NLP in the field.

The manual validation and correction process by (bio)medical researchers at the University Medical Center Groningen (UMCG) revealed an overall data extraction accuracy of 95.9%. This means that ~95.9%of the EGFR mutations were identified in the analyzed dataset. Although approximately 4.1%of EGFR-mutant patients were not identified, most of the unidentified mutations were likely variants of unknown significance.

A detailed analysis of the extracted data revealed the prevalence of EGFR mutations in patients diagnosed with NSCLC. Among these patients, 73% had classical EGFR mutations, while non-classical EGFR mutations were detected in 27% of the patients. This new, real-world knowledge can help guide research efforts towards novel therapeutics.

Conclusions: Real-World Evidence Illuminates the Path Forward for Lung Cancer Research

Our approach streamlines the process of annotating and processing clinical databases like PALGA, unlocking crucial insights with unprecedented efficiency. The successful implementation of the LynxCare NLP technology serves as a testament to the transformative power of AI-driven algorithms for data extraction.

This groundbreaking study not only showcases the power of our AI model, but also uncovers the complex genetic landscape of the NSCLC patient population. The RWE we obtained offers invaluable insights, potentially serving as a guiding light for validating current treatments and shaping the future of NSCLC research.

Our findings emphasize the importance of unraveling both classical and non-classical EGFR mutations in the context of NSCLC. This highlights the diverse nature of this disease and the need for continued exploration.

With this newfound potential, researchers can now focus on translating these discoveries into improved patient care and outcomes, ultimately making a lasting impact in the world of NSCLC treatment and research.

Heading

Heading

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Talk to an Expert

Other articles that might interest you

Visit our Knowledge Center