Automating Research Data Using Metadata-Driven Processes to Dynamically Create Loader Program
Virtual Data Warehouse, dynamic programming
Background/Aims: The growing complexity and heterogeneity of source data systems makes maintaining a research data warehouse challenging, especially with the increasing need for current data with shorter refresh cycles and consistent results. A stable extract-transform-load (ETL) environment is essential to maintain consistent results. To address inconsistency, high maintenance costs and the need to rapidly adapt to data changes, Kaiser Permanente Northwest’s Center for Health Research (CHR-NW) undertook an effort to automate the ETL loading process.
Methods: CHR-NW developed a dynamic loader program to generate, validate and execute on-demand programs that load data into the production version of the Virtual Data Warehouse. It was developed in SQL Server Integration Services, and SQL Server Agent is used for scheduling and execution. Transform programs add entries to a task table upon completion, which the dynamic loader uses to identify staging tables with new data and then generate a loader program. Programs are generated using metadata about the staging tables, allowing adaptation to changes in the data structures and rapid deployment of new content.
Results: The dynamic loader approach enables CHR-NW to generate loader code without manual intervention. Its use of metadata in the staging environment provides the mechanism to adapt to changes in the data structures. It has reduced the effort needed to maintain the load step of the ETL process by reducing the number of programs from 55 to 1. As a reference, in the first half of 2015, those 55 loader programs were updated an average of four times each and each program change took approximately 2 hours to change, review and test. That effort has largely been eliminated. The load process also runs more consistently, reducing the error rate from 0.1 to 0.005. Business logic and new and developing best practices can be applied in one place and are automatically implemented. In one recent example, CHR-NW was able to reduce load times for large tables by 50% (34 minutes to 15).
Conclusion: Developing the dynamic loader has increased programmer productivity and the consistency of data load results, allowing CHR-NW to rapidly apply techniques to improve computing processing efficiency.
Leitch S, Bauck AE, Allison M, Cleveland C, Hornbrook M. Automating research data using metadata-driven processes to dynamically create loader program. J Patient Cent Res Rev. 2016;3:228.