Abstract Body: Clinical data is a valuable resource for research and other secondary use purposes. However, due to its nature, clinical data resides in distributed data repositories in various forms (e.g., written reports, structured data, semi-structured data such as genomic tests, imaging). Finding, selecting and integrating the research data for a given research question requires a set of data curation activities including data access, query, extraction, transformation, cleaning, aggregation, and sharing. Each of these steps in the data lifecycle impacts the scope and coverage of the resulting curated data set. For reproducible research, the research data curation workflow should be clearly documented, if possible in a machine interpretable way, and should be accessible beyond the lifetime of data curation process.
This interactive session will utilize panel members to work with session attendees in small groups to create high level workflow descriptions of one or more data curation activities (data access, query, extraction, transformation, cleaning, aggregation, and sharing). The panel moderator will present the current status of work in this area and a description of the small group hands-on session, including tasks for each group to complete. Panel members will break out into small groups of attendees to complete the tasks and prepare a short presentation of the groups progress. The attendees will come back to a single group where results from the breakout session are reported for each group. The final session will be a group discussion of the small group results with the focus being interdependence of the curation activities and how the individual workflows could be integrated into generalized model for reproducible data creation.
Leslie McIntosh (Presenter)
Rensselaer Polytechnic Institute
Umit Topaloglu (Presenter)
Wake Forest Baptist Medical Center
Bernie LaSalle (Presenter)
University of Utah
Firas Wehbe (Presenter)
Casey Taylor (Presenter)
Johns Hopkins University