Abstract: We studied a large retrospective Electronic Health Record data set (IBM Explorys, 15.6 million patients over 3 years) to identify factors predicting 30-day ED visits utilizing rich clinical information (2638 features), the largest such study to date. Logistic regression and random forest models trained on a 249-node Apache Spark cluster performed well on test datasets, suggesting the promise of large-scale machine learning modeling for addressing critical population health issues.

Learning Objective 1: Build framework to predict risks of 30-day Emergency Department visits following any face-to-face encounters with very large EHR data over a distributed computing environment


Yifan Xu, IBM
Michael Dusenberry, IBM Spark Technology Center
Wei Yao, IBM
Amanda Yoho (Presenter)

Presentation Materials: