Clin Epidemiol
. 2022 Mar 22;14:369-384.
doi: 10.2147/CLEP.S323292. eCollection 2022.
Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS
Kristin Kostka 1 2 , Talita Duarte-Salles 3 , Albert Prats-Uribe 4 , Anthony G Sena 5 6 , Andrea Pistillo 3 , Sara Khalid 4 , Lana Y H Lai 7 , Asieh Golozar 8 9 , Thamir M Alshammari 10 , Dalia M Dawoud 11 , Fredrik Nyberg 12 , Adam B Wilcox 13 14 , Alan Andryc 5 , Andrew Williams 15 , Anna Ostropolets 16 , Carlos Areia 17 , Chi Young Jung 18 , Christopher A Harle 19 , Christian G Reich 1 2 , Clair Blacketer 5 6 , Daniel R Morales 20 , David A Dorr 21 , Edward Burn 3 4 , Elena Roel 3 22 , Eng Hooi Tan 4 , Evan Minty 23 , Frank DeFalco 5 , Gabriel de Maeztu 24 , Gigi Lipori 19 , Hiba Alghoul 25 , Hong Zhu 26 , Jason A Thomas 13 , Jiang Bian 19 , Jimyung Park 27 , Jordi Martínez Roldán 28 , Jose D Posada 29 , Juan M Banda 30 , Juan P Horcajada 31 , Julianna Kohler 32 , Karishma Shah 33 , Karthik Natarajan 16 34 , Kristine E Lynch 35 36 , Li Liu 37 , Lisa M Schilling 38 , Martina Recalde 3 22 , Matthew Spotnitz 14 , Mengchun Gong 39 , Michael E Matheny 40 41 , Neus Valveny 42 , Nicole G Weiskopf 21 , Nigam Shah 29 , Osaid Alser 43 , Paula Casajust 42 , Rae Woong Park 27 44 , Robert Schuff 21 , Sarah Seager 1 , Scott L DuVall 35 36 , Seng Chan You 45 , Seokyoung Song 46 , Sergio Fernández-Bertolín 3 , Stephen Fortin 5 , Tanja Magoc 19 , Thomas Falconer 16 , Vignesh Subbian 47 , Vojtech Huser 48 , Waheed-Ul-Rahman Ahmed 33 49 , William Carter 38 , Yin Guan 50 , Yankuic Galvan 19 , Xing He 19 , Peter R Rijnbeek 6 , George Hripcsak 16 34 , Patrick B Ryan 5 16 , Marc A Suchard 51 , Daniel Prieto-Alhambra 4
Affiliations
- PMID: 35345821
- PMCID: PMC8957305
- DOI: 10.2147/CLEP.S323292
Abstract
Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD.
Patients and methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services.
Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed.
Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.
Keywords: OHDSI; OMOP CDM; descriptive epidemiology; open science; real world data; real world evidence.