Abstract: Using electronic medical records from a random sample of 30,000 sepsis patients, we identified medications administered within the first 24 hours of hospitalization. We applied Latent Dirichlet Allocation to generate 10 topics based on medication co-occurrence and frequency. Adding medication topic composition to a logistic regression model of hospital mortality improved the c-statistic from 0.81 to 0.83 (p<0.01), explaining 23% of variability. Topic modeling using detailed EMR data identified distinct clinical phenotypes of sepsis.

Learning Objective 1: Apply the unsupervised topic modeling method Latent Dirichlet Allocation to detailed electronic medical record data to identify distinct clinical phenotypes among patients with similar diagnoses.


Alison Fohner (Presenter)
Kaiser Permanente

John Greene, Kaiser Permanente
Jonathan Chen, Stanford University
Gabriel Escobar, Kaiser Permanente
Vincent Liu, Kaiser Permanente

Presentation Materials: