Abstract: Alongside the emergence of clinical data research networks (CDRNs), such as Observational Health Data Sciences and Informatics (OHDSI) consortium,[1] computational phenotyping from electronic health records (EHR) data has become a major interest due to its potential for use cases in precision medicine, population health management, and in the understanding of complex diseases such as heart failure.[2] Researchers typically approach phenotyping with diverse workflows which may lead to irreproducible results and difficulty with widespread adoption. However, for advancements in medical care, it is paramount for data scientists to perform tasks in a structured manner that is consistent between people in order to preserve scientific validity. To address this shortcoming, we designed and implemented, a modular and executable phenotyping pipeline in the form of an easily portable data science notebook and showcased it on use cases for phenotype discovery and phenotype assignment for heart failure patients. Our study showcases a proof of concept of an open-source, scalable and modular phenotyping framework, using the lab notebook form factor with which many medical data scientists are familiar. It is the first such implementation of a data science analytics environment for performing phenotyping tasks on datasets that may be obtained in the OMOP common data model and should motivate future initiatives on releasing share-able, actionable code samples that can be easily customized to any medical data scientist’s preferences.

Learning Objective 1: To understand the importance of having repeatable scientific results with the aid of lab notebooks and walk through a demo of the notebook which conducts computational phenotyping in python.


Robert Chen (Presenter)
Georgia Institute of Technology

Jon Duke, Georgia Institute of Technology
Jimeng Sun, Georgia Institute of Technology

Presentation Materials: