Abstract: Research data requirements often fall outside conventional analytic patterns. Researchers require broader and deeper data, in source format not pre-defined analytic schemas; they have advanced analysis and programming skills; prefer a “self-service” model; and their requirements are evolutionary. Apache Hadoop®-based data lakes that implement lift-and-shift, late binding, self-service architectures are a natural fit for research analytics. This presentation discusses our experiences with exploring and implementing data lake architectures for research at NYU Langone.
Learning Objective 1: Describe the unique nature of data requirements for research analytics and the challenges in satisfying such requirements.
Learning Objective 2 (Optional): Compare the different architectural alternatives for provisioning data for research analytics.
Learning Objective 3 (Optional): Formulate an approach to creating an enterprise data lake in an academic medical center and other healthcare settings.
Learning Objective 4 (Optional): Describe the features of Apache Hadoop platform and explain how they support research analytics.
Rajan Chandras (Presenter)
NYU Langone Health
Michael Cantor, NYU Langone Health
Jeff Shein, NYU Langone Health