Article Title

Increasing Efficiency in Creating High-Level Site Comparisons and Populating Cohort Flow Diagrams: The “Cohort Diagram” Macro

Publication Date



quality assurance, programming efficiencies


Background/Aims: Understanding how a population is selected for an analysis is necessary for understanding potential sources of bias. Searching logs to determine counts for key steps of the process is cumbersome, and re-ordering a flow diagram after all data have been collected is problematic. We developed two tools at Kaiser Permanente Colorado (KPCO) for capturing counts during cohort creation, one of which was refined through a Center for Effectiveness and Safety Research study. These tools are used for comparing similarities and differences in data flowing from multiple health plan sites, for quality assurance, and for populating cohort diagrams for presentations and manuscripts.

Methods: A macro was created with parameters to define a source data set, an output table to hold counts and a step description. An alternate and complementary method of collecting counts was also developed that consists of building a table at the individual level with Boolean flags for each criterion used in cohort selection, which is then summarized into an n-way de-identified frequency table.

Results: The count macro method allows for users to collect any count at any time during the process. Users also specify whether to count rows of a table or distinct values of a specific variable. Multiple output data sets may be used to capture different data flows, such as one dataset for a cohort diagram and another to check for other key quality assurance counts. Tables returned from sites can readily be combined and compared to assess heterogeneity across sites. The n-way frequency table method, while CPU-intensive for large cohort processes, allows for maximum flexibility in the ordering of a cohort diagram. It also adds an extra layer of quality assurance, particularly when paired with a cohort diagram table built using the count macro.

Discussion: These tools have been shared within the KPCO analytic team to aid in data collection and cohort diagram development. They are simple to implement, applicable in distributed data environments and reusable across projects. Reviewing the tables prior to requesting further data can prevent data reruns. Other sites and projects will benefit from increased efficiency by using these tools.