Moving to a Weekly Refresh for the Virtual Data Warehouse
virtual data warehouse, weekly refresh
Background/Aims: Increasingly, conducting high-quality research requires access to current data. Previously, supplying current data was expensive because it required custom data extracts from source systems. In addition, these custom extracts were less robust because of frequent changes in health systems’ data structures, and they could only be validated within the context of a single project. Thus, information gleaned from them was rarely fed back to the wider research institution and was not informing more general knowledge of health-system data. To address these missed opportunities, and to maximize ways the virtual data warehouse (VDW) can support research, Kaiser Permanente Northwest’s Center for Health Research (CHR-NW) investigated how to change the VDW refresh rate from monthly to weekly.
Methods: We used a variety of data-management tools to change our processes and infrastructure in order to regularly deliver reliable data weekly, including: 1) We use SmartBatch to schedule weekly SAS jobs and SQL Server Agent to schedule SQL Server Integration Services (SSIS) jobs. 2) We moved to incremental builds of many data files; instead of executing a complete pull every week, we only extract data back to the point in time that it stops changing. 3) We no longer maintain a mirror copy of the VDW on an older operating system because of the cost/benefit of this task. 4) To help our transformation programs run more efficiently, we developed SSIS packages that pull source data to a local server. 5) We improved and automated our monitoring processes. 6) We established tolerance tests for when to accept or postpone data from source systems.
Results: We have been able to maintain the CHR-NW VDW on a weekly refresh cycle since June 2014, with no missed refreshes. The extract, transform and load programs that refresh our VDW run 4.4 times faster than they did just months before the transition. Our monitoring programs allow us to provide information back to our parent health system about source data.
Discussion: Changing our data systems to achieve a weekly refresh cycle was less burdensome than we first anticipated in terms of staff time. The added monitoring allows us to provide more stable data to our researchers.
Bauck AE, Cleveland C, Leitch S, Brandes J, Ackerson B. Moving to a Weekly Refresh for the Virtual Data Warehouse. J Patient Cent Res Rev 2015;2:117. http://dx.doi.org/10.17294/2330-0698.1143