Article Title

A Novel Method for Estimating Transgender Status Using EMR Data

Publication Date



disparities, transgender


Background/Aims: “Transgender” broadly describes individuals who either do not identify with their biological sex or do not conform to binary gender categories. Studies of transgender persons are impeded by lack of consistent terminology or coding to identify the population. Thus, transgender prevalence, direction of change (male-to-female [MTF] or female-to-male [FTM]) and health outcomes are not well known. We describe a novel algorithm for identifying a cohort of transgender persons from diverse electronic medical record (EMR) system data sources.

Methods: Provider visits for Kaiser Permanente Georgia members enrolled between 2006 and 2014 were collected from EMR databases. The method to build a cohort involved an SAS software algorithm, which scanned these databases for relevant ICD-9 codes and presence of specific keywords in digitized provider notes. The first step was to identify transgender status (e.g. keywords “transgender” or “gender identity”) and the second step to determine natal sex (e.g. keywords of “ovaries” or “testes”) given confirmed transgender status. Trained reviewers confirmed transgender status and, if confirmed, determined natal sex from a limited, focused review of keyword-containing text strings (± 50–100 characters around the keyword, stripped of protected health information). Accuracy of identification and proportions of MTF and FTM subjects were computed, along with 95% confidence intervals (CI).

Results: Of 823,104 members, 271 were identified as possibly transgender by the SAS algorithm: 137 through keywords only, 25 through ICD-9 codes only, and 109 through both ICD-9 and keywords. Of these 271, 185 (68%, 95% CI: 62–74) were confirmed as transgender: 62 (45%, 95% CI: 37–54), 14 (56%, 95% CI: 35–75), and 109 (100%; 95% CI: 96–100) for keywords only, ICD codes only, and both, respectively. Of the 185 confirmed transgender persons, 95 (51%, 95% CI: 44–59) were MTF and 76 (41%, 95% CI: 34–49) were FTM; natal sex remained unknown for the remaining 14 (8%, 95% CI: 4–13).

Conclusion: This method for support of research on transgender persons is low cost, rapid and valid –– saving substantial time and cost that would have been required of review of nearly 1 million medical records. The overall approach, and the SAS algorithm, can be easily transferred to other health care systems.