medRxiv preprint
doi: https://doi.org/10.1101/2021.09.07.21263228;
this version posted September 13, 2021
Authors: Fritz Obermeyer1,8✝, Stephen F. Schaffner1,3,4, Martin Jankowiak1,8, Nikolaos Barkas1, Jesse D. Pyle1, Daniel J. Park1, Bronwyn L. MacInnis1,4,5, Jeremy Luban1,5,6, Pardis C. Sabeti1,3,4,5,7*, Jacob E. Lemieux1,2*,✝
Abstract
Repeated emergence of SARS-CoV-2 variants with increased transmissibility necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative transmissibility of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to transmissibility. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase transmissibility, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization.
One Sentence summary: A Bayesian hierarchical model of all viral genomes predicts lineage transmissibility and identifies associated mutations.