Article Title

Physician Documentation Behaviors in Electronic Health Records as a Potential Source of Noise for Early Detection of Heart Failure

Publication Date



electronic health record, machine learning


Background/Aims: Electronic health records (EHRs) are a potentially rich source of data for developing predictive models for early detection of heart failure. But, EHR data can vary because of both patient health and variation in clinical practice and behavior among physicians. The “noise” contributed by variable physician behaviors, such as differences in frequencies of documentation, could potentially confound predictive models for detection of heart failure. In this study, we characterized the documentation behaviors of primary care physicians (PCPs) in an effort to better understand this potential source of noise.

Methods: We used longitudinal EHR data on a stratified random sample of 5,187 patients who were: 1) 50 years of age or older, and 2) without a history of heart failure. PCPs (N = 144) were identified and paired with the patients that they had treated for a minimum of 6 months. We derived 28 measures to characterize PCP behaviors –– with documentation frequencies of assertions and denials of selected Framingham heart failure signs and symptoms (FHFSS) in office visit encounter notes adjusted for patient comorbidities. Hierarchical clustering analyses were performed on PCP documentation behaviors.

Results: PCPs were clustered into three groups with distinct documentation behaviors. Group 1 PCPs (n = 63) documented 10 out of 15 assertions and 11 of 13 denials of FHFSS, significantly more frequently than Group 3 PCPs (n = 20); Group 2 PCPs (n = 61) had significantly more frequent denial documentation behaviors than the other two. No significant differences were found among patients’ chronic, episodic and cardiometabolic chronic disease counts (comorbidities) in each of the groups (P < 0.05).

Conclusion: This study identified PCP groups with distinct documentation behaviors unrelated to patient complexity. This source of noise and potential confounder should be taken into account for predictive modeling.