Kubik S, Marques AC, Xing X, Silvery J, Bertelli C, De Maio F, Pournaras S, Burr T, Duffourd Y, Siemens H, Alloui C, Song L, Wenger Y, Saitta A, Macheret M, Smith EW, Menu P, Brayer M, Steinmetz LM, Si-Mohammed A, Chuisseu J, Stevens R, Constantoulakis P, Sali M, Greub G, Tiemann C, Pelechano V, Willig A, Xu Z
Clin Microbiol Infect - (-) - [2021-04-01; online 2021-04-01]
SARS-CoV-2 genotyping has been instrumental to monitor viral evolution and transmission during the pandemic. The quality of the sequence data obtained from these genotyping efforts depends on several factors, including the quantity/integrity of the input material, the technology as well as laboratory-specific implementation. The current lack of guidelines for SARS-CoV-2 genotyping leads to inclusion of error-containing genome sequences in genomic epidemiology studies. We aimed at establishing clear and broadly applicable recommendations for reliable virus genotyping. We established and used a sequencing data analysis workflow that reliably identifies and removes technical artifacts, which can result in miscalls when using alternative pipelines, to process clinical samples and synthetic viral genomes with an amplicon-based genotyping approach. We evaluated the impact of experimental factors, including viral load and sequencing depth, on correct sequence determination. We found that at least 1000 viral genomes are necessary to confidently detect variants in the SARS-CoV-2 genome at frequencies of 10% or higher. The broad applicability of our recommendations was validated in over 200 clinical samples from six independent laboratories. The genotypes we determined for clinical isolates with sufficient quality cluster by sampling location and period. Our analysis also supports the rise in frequency of 20A.EU1 and 20A.EU2, two recently reported European strains whose dissemination was facilitated by travelling during the summer of 2020. We present much-needed recommendations for reliable determination of SARS-CoV-2 genome sequence and demonstrate their broad applicability in a large cohort of clinical samples.
Data generated with clinical samples can be requested using this information (not suitable for public sharing)
Alternative direct link to SRA (accession: PRJNA681574)
SRA: PRJNA681574 Datasets generated with synthetic SARS-CoV-2 genome