Abstract: In this study, we compare and evaluate forty-five machine learning models’ ability to distinguish stroke patients from other forms of cerebrovascular disease. By combining multiple case-control and classifier types, we determined that for this phenotyping question, a manually curated set of stroke cases did not perform better than a set mined from billing codes, and the type of control was the most important factor for classifier success.

Learning Objective 1: The major learning objective from the presentation is that selection and comparison of cases, controls, and machine learning model types are essential for the most successful EHR phenotyping.


Phyllis Thangaraj (Presenter)
Columbia University

Joseph Romano, Columbia University
Fernanda Polubriaginof, Columbia University
Nicholas Giangreco, Columbia University
Mitchell Elkind, Columbia University
Nicholas Tatonetti, Columbia University

Presentation Materials: