This week we had our paper titled "A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes." presented in Second International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH'16), co-located with 42 nd International Conference on Very Large Data Bases (VLDB 2016). Sep. 2016.
Abstract: Medical research use cases are population centric, unlike the clinical use cases which are patient or individual centric. Hence the research use cases require accessing medical archives and data source repositories of heterogeneous nature. Traditionally, in order to query data from these data sources, users manually access and download parts or whole of the data sources. The existing solutions tend to focus on a specific data format or storage, which prevents using them for a more generic research scenario with heterogeneous data sources where the user may not have the knowledge of the schema of the data a priori.
In this paper, we propose and discuss the design, implementation, and evaluation of Data Café, a scalable distributed architecture that aims to address the shortcomings in the existing approaches. Data Café lets the resource providers create biomedical data lakes from various data sources, and lets the research data users consume the data lakes efficiently and quickly without having a priori knowledge of the data schema.