Abstract: Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Committee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through voluminous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

Learning Objective 1: Demonstrate practical application of natural language processing and machine learning in information extraction from clinical text of records in central cancer registry


Abdulrahman AAlAbdulsalam, University of Utah
Jennifer Garvin, University of Utah
Andrew Redd, University of Utah
Marjorie Carter, University of Utah
Carol Sweeny, University of Utah
Stephane Meystre (Presenter)
Medical University of South Carolina

Presentation Materials: