Is the human virome a potential factor of infection susceptibility?

What is this research project about?

Bioinformatics workflow for discovering viral sequences in primary sequencing data using a high-performance computing cluster.


What is this research project about?

We study the genetic diversity of the virosphere at different scales because we are interested in better understanding the diversity of viruses both across eukaryotes and within the human virome. By utilizing a high-performance computing-based virus discovery approach developed by us, we screen sequencing data for the presence of known and unknown viral sequences, including highly divergent viruses with no close relatives in reference databases. We apply this approach to very large amounts of published sequencing data and to unpublished data from patient cohorts in order to explore inter-individual differences in virome composition between diseased and healthy persons.

Furthermore, we seek to extend our virus discovery approach to animal viruses, aiming to identify primate and other vertebrate species harboring unknown relatives of human pathogenic viruses, which may form the basis for establishing new animal infection models.
In addition to that, we are involved in project A1 where we study genetic determinants of severe infection with the human respiratory syncytial virus (RSV) in infants.

What’s the current status?

We have developed a high-performance computing workflow for the discovery of viral sequences in unprocessed next generation sequencing (NGS) data that we originally developed to search for novel RNA viruses in published NGS data from the Sequence Read Archive (SRA) repository. We have so far screened about 500,000 SRA datasets covering the full spectrum of available eukaryotic transcriptomes and discovered numerous sequences from known and unknown RNA viruses. In addition, we have analyzed about 76,000 human SRA experiments with available tissue/organ annotation, which led us to identify numerous known and novel anelloviruses, amongst others. Notably, we frequently detected the genomes of dozens of viruses in the same sample, suggesting that viral communities may exist within an individual person. These studies, and in particular analyses aiming to uncover possible association with health and disease, are ongoing.

What is this research project about?

Phylogenetic tree of human and animal anellovirus ORF1 proteins.

What are the project goals?

We aim to explore the human virome as a potential factor of infection susceptibility as well as a possible cause of other diseases including primary immunodeficiencies. To this end, we seek to determine the number and diversity of viruses associated with humans, as comprehensively and as tissue-specific as possible. We hypothesize that identifying inter-individual differences in virome composition will provide novel insights about susceptibility to or course of disease. We also seek to discovery animal and other eukaryotic viruses related to human viruses to study how viruses evolve, adapt and jump into new hosts.

How do we get there?

We will expand our screen of the SRA repository to analyze many more of the millions of available human experiments. Complementary to this, we have initiated collaborations within RESIST to analyze the virome of patients with primary immunodeficiencies and of premature infants. In order to improve the sensitivity of our virus discovery approach, we started with the development of a new method based on artificial neural networks. By incorporating both sequence information and secondary and tertiary protein structure information predicted by methods like AlphaFold, we expect to be able to identify highly divergent viral sequences in NGS data that remained undetected in previous analyses basing on sequence homology alone.

Comparison of protein structures of viral RNA polymerase of SARS-CoV-2 and remotely related Ball python nidovirus.


Project title: Computational Virology

Prof. Dr. Chris Lauber

Project: A6

Project A6 Publications

Publications 2022

Opportunities and Challenges of Data-Driven Virus Discovery. Lauber C, Seitz S. Biomolecules. 2022 Aug 4;12(8):1073.

Publications 2021

HBV evolution and genetic variability: Impact on prevention, treatment and development of antivirals. Glebe D, Goldmann N, Lauber C, Seitz S. Antiviral Research

Conservation of the HBV RNA element epsilon in nackednaviruses reveals ancient origin of protein-primed reverse transcription. Beck J, Seitz S, Lauber C, Nassal M. Proceedings of the National Academy of Sciences 2021.

Initial HCV infection of adult hepatocytes triggers a temporally structured transcriptional program containing diverse pro- and anti-viral elements. Tegtmeyer B, Vieyres G, Todt D, Lauber C, Ginkel C, Engelmann M, Herrmann M, Pfaller CK, Vondran FWR, Broering R, Vafadarnejad E, Saliba AE, Puff C, Baumgärtner W, Miskey C, Ivics Z, Steinmann E, Pietschmann T, Brown RJP. Journal of Virology 2021.

Bioinformatics of virus taxonomy: foundations and tools for developing sequence-based hierarchical classifications. Gorbalenya AE and Lauber C. Current Opinion in Virology 2021.

Deep mining of the Sequence Read Archive reveals bipartite coronavirus genomes and inter-family Spike glycoprotein recombination. Lauber C, Vaas J, Klingler F, Mutz P, Gorbalenya AE, Bartenschlager AE, Seitz S. bioRxiv 2021.

Publications 2020

Liver-expressed Cd302 and Cr1l limit hepatitis C virus cross-species transmission to mice. Brown RJP, Tegtmeyer B, Sheldon J, Khera T, Anggakusuma, Todt D, Vieyres G, Weller R, Joecks S, Zhang Y, Sake S, Bankwitz D, Welsch K, Ginkel C, Engelmann M, Gerold G, Steinmann E, Yuan Q, Ott M, Vondran FWR, Krey T, Stroeh LJ, Miskey C, Ivics Z, Herder V, Baumgaertner W, Lauber C, Seifert M, Tarr AW, McClure CP, Randall G, Baktash Y, Ploss A, Loan Dao Thi V, Michailidis E, Saeed M, Verhoye L, Meuleman P, Goedecke N, Wirth D, Rice CM, Pietschmann T. Science Advances 2020.

Publications of the Project A6