Abstract: The Health Information Portability and Accountability Act (HIPAA) allows for the exchange of de-identified patient data, but its definition of de-identification is essentially open-ended, thus leaving the onus on dataset providers to ensure patient privacy. The Patient Centered Outcomes Research Network (PCORnet) builds a de-identification approach into queries, but we have noticed various subtle problems with this approach. We censor aggregate counts below a threshold (i.e. <11) to protect patient privacy. However, we have found that thresholded numbers can at times be inferred, and some key numbers are not thresholded at all. Furthermore, PCORnet’s approach of thresholding low counts introduces a selection bias which slants the data towards larger health care sites and their corresponding demographics. We propose a solution: instead of censoring low counts, introduce Gaussian noise to all aggregate counts. We describe this approach and the freely available tools we created for this purpose.

Learning Objective 1: After completing this activity, the person will be better able to protect their patients’ data in their data warehouse.


Jeffrey Klann (Presenter)
Harvard University School of Medicine

Matthew Joss, Partners Healthcare
Rohan Shirali, Boston Children's Hospital
Marc Natter, Boston Children's Hospital
Sebastian Schneeweiss, Brigham and Women's Hospital
Kenneth Mandl, Boston Children's Hospital
Shawn Murphy, Partners Healthcare

Presentation Materials: