event-icon

Oral Presentations

Exploring Hadoop-Based Data Lakes for Research

11:42 AM–12:00 PM Mar 15, 2018 (US - Pacific)

Mission

Description

Abstract: Research data requirements often fall outside conventional analytic patterns. Researchers require broader and deeper data, in source format not pre-defined analytic schemas; they have advanced analysis and programming skills; prefer a “self-service” model; and their requirements are evolutionary. Apache Hadoop®-based data lakes that implement lift-and-shift, late binding, self-service architectures are a natural fit for research analytics. This presentation discusses our experiences with exploring and implementing data lake architectures for research at NYU Langone.

Learning Objective 1: Describe the unique nature of data requirements for research analytics and the challenges in satisfying such requirements.

Learning Objective 2 (Optional): Compare the different architectural alternatives for provisioning data for research analytics.

Learning Objective 3 (Optional): Formulate an approach to creating an enterprise data lake in an academic medical center and other healthcare settings.

Learning Objective 4 (Optional): Describe the features of Apache Hadoop platform and explain how they support research analytics.

Authors:

Rajan Chandras (Presenter)
NYU Langone Health

Michael Cantor, NYU Langone Health
Jeff Shein, NYU Langone Health

Presentation Materials:

Keywords