\makeatletter
\@ifundefined{HCode}
{\documentclass[CRBIOL,Unicode,screen,biblatex]{cedram}
\addbibresource{crbiol20260085.bib}
\newenvironment{noXML}{}{}
\let\citep\parencite
\let\citet\textcite
\def\xcitealp#1#2{\citeauthor{#1}, \citelink{#1}{#2}}
\newcommand*{\citelink}[2]{\hyperlink{cite.\therefsection @#1}{#2}}
\def\defcitealias#1#2{}
\let\citepalias\parencite
\let\citetalias\textcite
\newenvironment{Table}{\begin{table}}{\end{table}}
\def\thead{\noalign{\relax}\hline}
\def\endthead{\noalign{\relax}\hline}
\def\tabnote#1{\vskip4pt\parbox{.92\linewidth}{#1}}
\def\tsup#1{$^{{#1}}$}
\def\tsub#1{$_{{#1}}$}
\RequirePackage{etoolbox}
\def\jobid{crbiol20260085}
%\graphicspath{{/tmp/\jobid_figs/web/}}
\graphicspath{{./figures/}}
\newcounter{runlevel}
\let\MakeYrStrItalic\relax
\def\dedication#1{} 
\def\refinput#1{}
\def\back#1{}
\def\hyphen{\text{-}}
\def\0{\phantom{0}}
\skip\footins 11pt
\def\botline{\\\hline}
\DOI{10.5802/crbiol.193}
\datereceived{2026-02-03}
\daterevised{2026-04-01}
\dateaccepted{2026-04-01}
\ItHasTeXPublished
\def\og{\guillemotleft}
\def\fg{\guillemotright}
\makeatletter
\g@addto@macro{\UrlBreaks}{\UrlOrds}
\gappto{\UrlBreaks}{\UrlOrds}
\usepackage{hyperref}
\makeatother
\def\bcaption#1#2#3{\caption{Caption continued on next page.}\end{figure*}\setcounter{figure}{#2}\begin{figure*}\vspace*{-1pc}\caption{(\textbf{cont}.)\space#1\space #3}}
}
{\documentclass[crbiol]{article}
\def\CDRdoi{10.5802/crbiol.193}
\let\newline\break
\def\selectlanguage#1{}
\usepackage[T1]{fontenc}
\let\tabonesplittabular\relax
\let\tabtwosplittabular\relax
\def\citelink#1#2{\citeyear{#1}}
\def\xcitealp#1#2{\citealp{#1}}
\PassOptionsToPackage{authoryear}{natbib} 
\def\href#1#2{\url[#1]{#2}}
\def\bcaption#1#2#3{\caption{#1#3}}
\makeatletter
\newcommand\@coi{}
\newcommand\COI[1]{\gdef\@coi{#1}}
\newcommand\printCOI{\ifx\@coi\@empty\else%
\section*{Declaration of interests}
\@coi\fi
}
}
\makeatother

\usepackage{upgreek}

\COI{The authors do not work for, advise, own shares in, or receive
funds from any organization that could benefit from this article, and
have declared no affiliations other than their research organizations.}

\begin{document}

%\dateposted{2026-02-16}

\begin{noXML}

\CDRsetmeta{articletype}{review}	

\editornote{Article submitted by invitation}
\alteditornote{Article soumis sur invitation}

\title{Organization and evolution of the virosphere and the replicator
space}

\alttitle{Organisation et \'{e}volution de la virosph\`{e}re et de
l'espace des r\'{e}plicateurs}

\author{\firstname{Mart} \lastname{Krupovic}\CDRorcid{0000-0001-5486-0098}\IsCorresp\IsEqualContrib}
\address{Institut Pasteur, Universit\'{e} Paris Cit\'{e}, Cell Biology
and Virology of Archaea Unit, Paris, France}
\email[M. Krupovic]{mart.krupovic@pasteur.fr}

\author{\firstname{Eugene V.} \lastname{Koonin}\CDRorcid{0000-0003-3943-8299}\IsCorresp\IsEqualContrib}
\address{Computational Biology Branch, Division of Intramural Research,
National Library of Medicine, National Institutes of Health, Bethesda,
MD 20894, USA}
\email[E. V. Koonin]{koonin@ncbi.nlm.nih.gov}

\keywords{\kwd{Virosphere}\kwd{Evolution and origins of
viruses}\kwd{Viral hallmark genes}\kwd{Virus--host
coevolution}\kwd{Virus megataxonomy}}

\altkeywords{\kwd{Virosph\`{e}re}\kwd{\'{E}volution et origines des
virus}\kwd{G\`{e}nes caract\'{e}ristiques des
virus}\kwd{Co\'{e}volution virus--h\^{o}te}\kwd{M\'{e}gataxonomie des
virus}}

\dedicatory{\hfill This article is dedicated to the memory of the outstanding
virologist Dennis H. Bamford (1948--2025)\break\vspace*{.8pc}}

\begin{abstract} 
Viruses are obligate symbionts of cellular life forms that can
replicate only within host cells and typically form virions (virus
particles) to spread among host organisms. Virions numerically dominate
the biosphere, exceeding the number of cells several-fold, and also
comprise the main reservoir of genetic diversity on earth. Nearly all
organisms host multiple, diverse viruses. Unlike cellular organisms,
viruses have genomes (genetic information carriers incorporated into
virions) that consist of all forms of RNA and DNA, suggesting an
evolutionary connection between extant viruses and the primordial
replicator pool. Lately, extensive mining of metagenomes and
metatranscriptomes has dramatically expanded the world of viruses
(virosphere), revealing an unsuspected and unprecedented diversity.
Viruses share no universal genes and have multiple origins. However,
about 15 viral hallmark genes each bring together multiple, diverse
groups of viruses, and many other genes are shared within such groups.
Evolution of viruses is inextricably intertwined with the evolution of
their hosts. A key aspect of virus--host coevolution is the arms race
resulting in accelerated evolution on both sides, especially of host
defenses and viral counter-defenses. A complementary, prominent feature
of this coevolution is exaptation, whereby viral genes are coopted by
the hosts for antiviral defense and other roles, and conversely,
viruses capture host genes for diverse functions in virus replication,
virion morphogenesis and virus--host interaction. In this review, we
attempt a synthesis of the current understanding of the global
organization of the virosphere, the major trends and events in the
evolution of viruses, and the high-level taxonomy of viruses.
\vspace*{-2pt}
\end{abstract}

\begin{altabstract}
Les virus sont des symbiotes obligatoires des formes de vie cellulaires
; ils ne peuvent se r\'{e}pliquer qu'\`{a} l'int\'{e}rieur des cellules
h\^{o}tes et forment g\'{e}n\'{e}ralement des virions (particules
virales) pour se propager parmi les organismes h\^{o}tes. Les virions
dominent num\'{e}riquement la biosph\`{e}re, d\'{e}passant de plusieurs
ordres de grandeur le nombre de cellules, et constituent \'{e}galement
le principal r\'{e}servoir de diversit\'{e} g\'{e}n\'{e}tique sur
Terre. Presque tous les organismes h\'{e}bergent de nombreux virus
diff\'{e}rents. Contrairement aux organismes cellulaires, les virus
poss\`{e}dent des g\'{e}nomes (porteurs d'information g\'{e}n\'{e}tique
incorpor\'{e}e dans les virions) compos\'{e}s de toutes les formes
d'ARN et d'ADN, ce qui sugg\`{e}re un lien \'{e}volutif entre les virus
actuels et le pool de r\'{e}plicateurs primordiaux. R\'{e}cemment,
l'analyse approfondie des m\'{e}tag\'{e}nomes et des
m\'{e}tatranscriptomes a consid\'{e}rablement \'{e}largi le monde des
virus (la virosph\`{e}re), r\'{e}v\'{e}lant une diversit\'{e}
insoup\c{c}onn\'{e}e et sans pr\'{e}c\'{e}dent. Les virus ne partagent
aucun g\`{e}ne universel et ont des origines multiples. Cependant, une
quinzaine de g\`{e}nes viraux embl\'{e}matiques permettent de
rassembler chacun de multiples groupes de virus tr\`{e}s divers, et de
nombreux autres g\`{e}nes sont partag\'{e}s au sein de ces groupes.
L'\'{e}volution des virus est inextricablement li\'{e}e \`{a} celle de
leurs h\^{o}tes. Un aspect cl\'{e} de la co\'{e}volution
virus--h\^{o}te est la course aux armements qui entra\^{i}ne une
\'{e}volution acc\'{e}l\'{e}r\'{e}e des deux c\^{o}t\'{e}s, notamment
des d\'{e}fenses de l'h\^{o}te et des contre-d\'{e}fenses virales. Une
caract\'{e}ristique compl\'{e}mentaire et importante de cette
co\'{e}volution est l'exaptation, par laquelle les g\`{e}nes viraux
sont coopt\'{e}s par les h\^{o}tes pour la d\'{e}fense antivirale et
d'autres fonctions, et inversement, les virus capturent des g\`{e}nes
de l'h\^{o}te pour diverses fonctions li\'{e}es \`{a} la
r\'{e}plication virale, \`{a} la morphogen\`{e}se des virions et \`{a}
l'interaction virus--h\^{o}te. Dans cette revue, nous proposons une
synth\`{e}se des connaissances actuelles sur l'organisation globale de
la virosph\`{e}re, les principales tendances et \'{e}v\'{e}nements de
l'\'{e}volution des virus, et la taxonomie des virus \`{a} haut niveau.
\end{altabstract}

%\input{CR-pagedemetas}

\maketitle

\twocolumngrid

\end{noXML}

\dedication{This article is dedicated to the memory of the outstanding
virologist Dennis H.\ Bamford (1948--2025).}

\defcitealias{30}{ibid.}
\defcitealias{55}{ibid.}
\defcitealias{115}{ibid.}
\defcitealias{5}{ibid.}

\section{Introduction}\label{sec1}

Viruses are best known as agents of human, animal and plant diseases,
some of these widespread and devastating.\ Suffice it to mention
numerous epidemics of smallpox prior to the introduction of broad
vaccination, the 1918--1920 pandemic of Spanish flu, the catastrophic
epidemics of poliomyelitis in the 1930s and 1940s, eventually curtailed
by vaccines, and obviously, the recent COVID-19 pandemic \citep{1}.
Other viruses, such as certain strains of human papillomaviruses or
\mbox{Epstein--Barr} virus, are known to cause cancer \citep{2,3}.
Relatively well known are also viruses causing disease in plants, such
as the tobacco mosaic virus, the very first virus that was discovered
in the late 19{th} century \citep{4}. However, it is much less commonly
appreciated that viruses causing diseases in animals and plants,
however numerous and diverse, are but tiny parts of the vast virus
world, also known as the virosphere. The myriads of other viruses
comprising the virosphere infect virtually all other organisms on the
planet, in many cases killing the host, but in many others, causing no
harm or even being beneficial. 

Viruses are defined as mobile genetic elements (MGE) encoding at least
one protein that is a major component of virions (virus particles)
encasing the genome of the virus \citep{5}. Viruses are obligate
intracellular symbionts that reproduce inside cells of all organisms,
with the sole possible exception of some of the most drastically
reduced intracellular bacterial symbionts of eukaryotes. Moreover, most
organisms, from bacteria to humans, are hosts to a broad variety of
viruses with a wide range of genome sizes, diverse cycles of genome
replication and expression, virion structures and modes of interaction
with the hosts. When describing viruses, we purposefully use the
umbrella term ``symbionts'' rather than the more commonly used
``parasites'' because virus--host interactions span the entire spectrum
of symbiotic relationships with the hosts, from aggressive parasitism
to commensalism to mutualism \citep{6}.

The emergence of MGE, including viruses, is an intrinsic, fundamentally
inevitable, and essential feature of the evolution of life \citep{7}.
There are strong theoretical arguments and overwhelming empirical
evidence that MGE, including, most likely, bona fide viruses, have been
associated with the evolution of cellular life on Earth ever since its
earliest stages, that is, for more than 4~billion years \citep{8,9,10,11}.
In themselves, viruses cannot be legitimately considered life forms
inasmuch as they are not reproducers, that is, they lack physical
continuity across generations, and strictly depend on the host cells
for their replication \citep{12,13}. Instead, viruses are replicators,
maintaining genetic continuity and autonomy, and following evolutionary
trajectories that are distinct from, even if entangled with, those of
the hosts \citep{5}. 

In the last decade, our understanding of the world of viruses has been
transformed through the combination of the explosive growth of
metagenomic and metatranscriptomic sequence databases, and sustained
efforts on the discovery of novel viruses and the elucidation of the
evolutionary relationships within the virosphere \citep{14,15,16}. These
developments that take advantage of advanced methods for sequence and
protein structure analysis have been codified in the new, comprehensive
evolutionary taxonomy of viruses that has been formally adopted by the
International Committee on Taxonomy of Viruses (ICTV) \citep{17}. In
this review article, we undertake to present a synthesis of these
revolutionary advances, outlining the global structure of the
virosphere along with the concepts, trends and specific scenarios of
the evolution of the major groups of viruses. We further explore the
connections between viruses and other types of MGE, such as plasmids
and transposons, and the position of the virosphere within the
replicator space.\looseness=-1

\section{The dimensions of the virosphere}\label{sec2}

Before discussing our current understanding of the evolution of the
virosphere, we outline its key characteristics and dimensions. Unlike
cellular life forms, which all share the same fundamental scheme of
genetic information storage and transmission, with genomes consisting
of enormous double-stranded (ds) DNA molecules, viruses explore the
entire space of possible nucleic acid-based information storage and
transmission route cycles \citep{18,19,20}. All forms of DNA and RNA
are used as genomes, that is, the information carriers incorporated
into virions, by different groups of viruses. The nature of the genome
defines the viral replication-expression cycle, and accordingly viruses
are traditionally divided into 7 so-called Baltimore classes (first
introduced in David Baltimore's seminal 1971 paper{\break} \citep{18})
(Figure~\ref{fig1}A). Within different Baltimore classes, viral genomes
can be either circular or linear and either continuous or segmented.
The size of viral genomes also depends on the Baltimore class. Viruses
of 6 of the 7 classes, with the exception of class I (dsDNA genomes),
have (relatively) small genomes, within the range of 1 to about 60
kilobases (kb). By contrast, viral dsDNA genomes span 3 orders of
magnitude from about 5~kb to more than 4~megabases (Mb)
(Figure~\ref{fig1}B,C). Notably, the largest genomes of the so-called
giant viruses exceed in size and the number of genes the genomes of
numerous bacteria and archaea (and some parasitic eukaryotes as well),
defying the notion of viruses as miniscule entities and obliterating
the boundary between viruses and cellular life forms with regard to
genetic complexity \citep{21,22}. \looseness=1

\begin{figure*}
\includegraphics{fig01}
\caption{\label{fig1}The Baltimore classes and realms of viruses. (A)
Replication and expression of viral genomes in the 7 Baltimore classes.
Each of the classes is defined according to the form of the nucleic
acid incorporated into virions (the viral genome). (B) Sankey diagram
showing the correspondence between the Baltimore classes and viral
realms. (C) Range of genome sizes (kb) in each viral realm. The genome
size data for classified representatives of each realm was retrieved
from the NCBI GenBank database \citep{330}.}
{\vspace*{-.3pc}}
\end{figure*}

The diversity of the molecular structures and the wide size range of
viral genomes are complemented by the diversity of sizes, shapes and
molecular organizations of virions. Most virions are small, but like in
the case of the genomes, the particles of giant viruses are larger than
many bacterial and archaeal cells. In the great majority of virions,
the core structure is the capsid, the proteinaceous shell encasing the
genome. Most common capsids adopt one of the few shapes, primarily,
spherical or filamentous, built on icosahedral or helical symmetry,
respectively \citep{23,24}. There is, however, a broad variety of
comparatively rare, odd capsid shapes as well as several groups of
viruses that lack a typical capsid while retaining a glycoprotein-rich
lipid envelope encasing the genomic nucleic acid (see below).
Furthermore, some MGE that are traditionally considered viruses lack
genes for structural proteins and do not package their genomes at all,
while having clearly evolved from typical, capsid-encoding viruses
\citep{25} (see also discussion below). In some viruses, virions are
multilayered, consisting of more than one icosahedral shell, whereas in
other viruses, the protein capsid is enveloped by a lipid membrane
and/or contains an internal membrane surrounding the genome. 

A central theme in the study of virus evolution is the origin of
functional and structural features by divergence from common ancestors
{vs.}\ independent, convergent emergence.\ In the discussion that
\mbox{follows}, we shall see that neither the Baltimore classes of
viruses nor the basic shapes and organizations of virions are
monophyletic, convergence being a major trend in the evolution of the
virosphere. To a large extent, evolutionary convergence is underpinned
by physical constraints. As a major case in point, the number of basic
capsid shapes is limited to only a few, with the great majority of
capsids being either spherical or filamentous (often described as
rod-shaped in the case of rigid filaments), which are two simple,
symmetrical, thermodynamically favorable architectures. A larger
variety of odd capsid shapes are rare in the virosphere. Importantly,
convergence at the level of capsid shapes or morphologies does not
translate to convergence of the actual molecular solutions to capsid
building. Indeed, both icosahedral and helical capsids are constructed
from capsid proteins with multiple distinct structural folds
\citep{26}. Conversely, there is no evidence that structurally similar
capsid proteins convergently evolved from structurally distinct
ancestral proteins. Thus, structural similarity among capsid proteins
within a particular virus group is interpreted as evidence of common
ancestry and is one of the cornerstones of megataxonomy. In a more
general context, although viruses are clearly polyphyletic, origin of
new types of viruses is rare and is typically followed by extensive
evolutionary diversification, as discussed below.

There is a popular meme that virus particles are the most abundant
biological entities on Earth. 
The number of virions present on the planet at any given time has been
estimated at the hyper-astronomical value of about
10\tsup{31}, with a virus/microbe ratio (VMR) of 10 or
higher \citep{27,28}. However, these striking VMR values have been
calculated largely by using epifluorescence microscopy or flow
cytometry to count extracellular virions, primarily, in marine
environments. Both these techniques are prone, on the one hand, to
missing intracellular virions (including lysogens), but on the other
hand, to erroneously reporting non-viral particles as virions
\citep{29}. More reliable estimates for DNA viruses have been obtained
recently by quantification of the major capsid protein genes in
metagenomes compared to host hallmark genes (namely, ribosomal protein
genes). These analyses yielded a more nuanced picture whereby VMR was
generally lower than previously thought, about 2 on average, and
broadly varied across environments \citep{30}. The highest VMR, 3--4 on
average, was observed in metagenomes from aquatic environments, whereas
in animal-associated metagenomes, the typical VMR was much lower, with
the mean value around 1 \citepalias{30}, in agreement with previous
estimates for bacterial \citep{31,32} and archaeal \citep{33} viruses.
Intermediate VMR values were reported for soils, sediments, and
microbial mats. These findings would bring down the estimates of virus
abundance in the biosphere by about an order of magnitude, but still
support the \mbox{excess} of virus particles over cells on both the
planetary scale and in most habitats. The VMR is not a linear function
of the microbial density in the given environment. Analysis of
thousands of estimates from marine samples demonstrated that the
dependence of the VMR on microbial density is best described by power
law functions with exponents below unity, that is, the ratio of virions
to cells decreases with increased microbial population density
\citep{34}. At least in part, this dependency is explained by the
piggyback-the-winner model for temperate bacteriophages whereby
lysogeny becomes advantageous for a virus compared to lytic growth at
high host density \citep{35}.

Although a comprehensive census of the virosphere remains a major
challenge for the future, simple, ballpark estimates can give us an
idea of the total number of virus species and viral genes in the
biosphere \citep{36}. The bulk of the diversity in the virosphere is
accounted for by tailed bacteriophages with large dsDNA genomes, with
the other phages, archaeal and eukaryotic viruses adding relatively
little. There are at least 10\tsup{6}--10\tsup{7} species of bacteria
in the biosphere, with some estimates suggesting orders of magnitude
more. A typical bacterial species is the host to multiple viruses. For
example, for the \textit{Escherichia coli} strain K12, over 100
distinct viruses are currently known \citep{37}, whereas for a single
strain of \textit{Mycobacterium smegmatis}, more than 2000 viruses
have been identified following extensive sampling \citep{38,39,40}. In all
likelihood, these numbers underestimate the actual diversity of the
viromes of the respective bacterial species. Conservatively assuming
10--100 viral species per bacterial host species, the global virome can
be estimated to include at least 10\tsup{7}--10\tsup{9} virus species.
The upper bound on this range should be considered a more realistic
estimate, given the conservative assumptions used. Currently, only
about 17\,000 virus species are formally recognized \citep{17}, and even
taking into account numerous new ones that are waiting to be
established based on the results of metagenome and metatranscriptome
mining, only a tiny part of the virosphere has been explicitly
described so far. However, as we shall see below, this small sampling
is likely to be representative of the large, distinct regions of the
virosphere.

Based on the simple calculations above, the genetic diversity of the
virosphere, that is, the total number of unique genes in the global
virome, can be roughly estimated as well \citep{36}. The large dsDNA
genomes of typical bacteriophages that quantitatively dominate the
virosphere encompass many lineage-specific genes that are mostly
involved in counteracting the host antivirus defenses \citep{41}.
Conservatively assuming 10 such unique genes per virus species, the
plausible low bound of the virosphere diversity can be estimated at
about 10\tsup{10} genes, which is likely one to two orders of magnitude
greater than the number of unique genes in cellular life forms. Thus,
the virosphere is the main reservoir of genetic novelty on our planet.
These estimates are generally consistent with the more formal attempt
to estimate the potential virus genome and protein space by simulating
the process of virus discovery in viral metagenomic studies. Using the
power function, which was found to best fit the increasing trends of
virus diversity, it was estimated that there are at least 8.2
${\times}$ 10\tsup{8} viral operational taxonomic units and 1.6
${\times}$ 10\tsup{9} viral protein clusters on Earth \citep{42}. These
estimates suggest that less than 3\% of the viral genetic diversity has
been uncovered thus far.

The general message from this brief survey of the virosphere dimensions
is that, although viruses are in general tiny compared to cellular life
forms and fully depend on the hosts for their replication, the
virosphere is vast and enormously diverse, in some dimensions,
exceeding the diversity of cellular organisms. Understanding the
evolution of the biosphere without deep exploration of the virosphere
would be a hopeless undertaking. 

\section{The metagenomic revolution and{\hfill\break} virosphere
expansion} \label{sec3}

\looseness=-1
In the last decade, the knowledge of virus diversity, which can be
translated into reconstruction of the virosphere evolution, took a
major leap forward thanks to the advances of metagenomics and
metatranscriptomics \citep{14}. In the past, the study of viruses
required propagation in the laboratory, either in the actual host
organisms or in cell culture. Accordingly, the exploration of the
virosphere diversity had been limited to the relatively few species of
well-studied animals and plants and several model protist, bacterial
and archaeal species. However, it is well known from the surveys of
microbial diversity by 16S RNA sequencing that only about 0.1\% of
microbes currently can be grown in the laboratory \citep{43,44}. Thus,
the traditional approaches only scratch the proverbial surface of the
global microbiome and, consequently, of the global virome. The fast
progress of metagenomic and metatranscriptomic approaches has radically
changed this situation \citep{45}. Indeed, already in 2017, metagenome
and metatranscriptome mining became by far the most important source of
new virus discovery \citep{14}. It is difficult to come up with exact
estimates, but clearly, the number of viruses representing putative
distinct virus species discovered by metagenome and metatranscriptome
mining now exceeds that identified by traditional laboratory methods by
several orders of magnitude. The increasingly detailed understanding of
the global organization and evolution of the virosphere that we discuss
below could not have been possibly attained without extensive analysis
of metagenomic and metatranscripotomic sequence databases. Importantly,
acknowledging the dominant role of metagenomics and metatranscriptomics
in the exploration of the virosphere, the ICTV in 2017 formally
recognized the legitimacy of viral species and higher taxa established
based solely on sequence analysis, without isolation of the actual
virus \citep{14,46}. 

\section{The global structure and evolution of the{\hfill\break}
virosphere and viral megataxonomy}\label{sec4}
\subsection{Viral hallmark genes}\label{ssec41}

A fundamental fact that needs to be emphasized before even starting to
discuss the structure and evolution of the virosphere is that viruses
do not share a single common origin, in sharp contrast to cellular life
forms. About 100 genes are strictly universal among cellular organisms
and can be used to build the ``Tree of life'', individually or more
commonly, in different combinations \citep{47,48}. Viruses, however,
lack any universal genes, and accordingly, there can be no single tree
of viruses, in principle \citep{8}. Put another way, unlike cellular
life forms, which all can be traced back to the Last Universal
\mbox{Cellular} Ancestor (LUCA), viruses are polyphyletic, that is, had
multiple origins. That said, although the total number of independent
viral origins is hard to estimate, there are relatively few vast
assemblages of viruses for which common ancestry{\break} is traceable. 

The evolutionary connections between diverse large groups of viruses
were traced by comparative and phylogenetic analysis of shared viral
hallmark genes (VHG) \citep{49,50,51,52}. The VHGs are difficult to
define formally because of an apparent circularity inherent in that
definition: VHGs are conserved across diverse groups of viruses which
are themselves defined through shared VHGs. Informally, however, VHGs
are readily identifiable, and their identities are not surprising:
these are genes encoding major virion morphogenesis (primarily, capsid)
proteins and key components of viral replication machineries
(Table~\ref{tab1}). In viruses with small genomes, the VHGs represent a
large fraction of the genes, and in some cases, all of the genes. By
contrast, in viruses with large genomes, the VHGs can comprise less
than 1\% of the genes. Nevertheless, in all viruses, the VHGs are
central both to virus replication and morphogenesis, and to the study
of the evolution of the virosphere.

%tab1
\begin{table*}
\caption{\label{tab1}Virus hallmark proteins}
\tabcolsep2.9pt\fontsize{9.8}{11.8}\selectfont
\begin{tabular}{cccc}
\thead
Protein & Structural fold & Function & 
\parbox[t]{9pc}{\centering Distribution in the virosphere}\vspace*{2pt} \\
\endthead

\parbox[t]{9pc}{\centering RNA-dependent RNA polymerase} & RRM &
\parbox[t]{9pc}{\centering Replication and expression of RNA genomes}\vspace*{6pt} & 
\textit{Orthornavirae} \\ 

\parbox[t]{9pc}{\centering Reverse transcriptase} & RRM & 
\parbox[t]{9pc}{\centering Replication of
all RNA and DNA genomes with a reverse transcription stage in the
replication cycle}\vspace*{6pt} & \textit{Pararnavirae} \\ 

\parbox[t]{9pc}{\centering HUH superfamily rolling circle replication endonuclease} & RRM &
\parbox[t]{9pc}{\centering Initiation of rolling circle DNA replication} & 
\parbox[t]{9pc}{\centering \textit{Efunaviria},
\textit{Volvereviria}, \textit{Floreoviria}, \textit{Pleomoviria}, some
\textit{Duplodnaviria}, \textit{Varidnaviria} and \textit{Adnaviria}}\vspace*{6pt} \\

\parbox[t]{9pc}{\centering Family B DNA polymerase} & RRM & 
\parbox[t]{9pc}{\centering Replication of dsDNA genomes} & 
\parbox[t]{9pc}{\centering Many \textit{Duplodnaviria},
\textit{Varidnaviria}, \textit{Naldaviricetes}, \textit{Bidnaviridae},
several families of archaeal viruses}\vspace*{6pt} \\ 

\parbox[t]{9pc}{\centering Archaeo-eukaryotic primase} & RRM & 
\parbox[t]{9pc}{\centering Priming of DNA synthesis} & 
\parbox[t]{9pc}{\centering Many \textit{Duplodnaviria}, \textit{Varidnaviria},
\textit{Naldaviricetes}}\vspace*{6pt} \\ 

\parbox[t]{9pc}{\centering Superfamily 3 helicase} & 
\parbox[t]{9pc}{\centering P-loop NTPase (AAA$+$ ATPase class)} & 
\parbox[t]{9pc}{\centering Unwinding of RNA and DNA genomes during replication} & 
\parbox[t]{9pc}{\centering Many \textit{Riboviria},
\textit{Duplodnaviria}, \textit{Varidnaviria}, \textit{Floreoviria},
\textit{Naldaviricetes}}\vspace*{6pt} \\

FtsK-like ATPase & P-loop NTPase & 
\parbox[t]{9pc}{\centering ATP-dependent genome packaging} &
\parbox[t]{9pc}{\centering Most \textit{Varidnaviria}, \textit{Singelaviria,} \textit{Efunaviria}}\vspace*{6pt} \\ 

\parbox[t]{9pc}{\centering Terminase, large subunit} & 
\parbox[t]{9pc}{\centering P-loop NTPase (RecA class)---RNase H-fold nuclease} & 
\parbox[t]{9pc}{\centering ATP-dependent genome packaging and processing}\vspace*{6pt} &
\textit{Duplodnaviria} \\ 

\parbox[t]{9pc}{\centering Single jelly-roll capsid protein} & 
\parbox[t]{9pc}{\centering Jelly-roll (8-stranded ${\upbeta}$-barrel)} & 
\parbox[t]{9pc}{\centering Capsid formation: major and minor capsid proteins} &
\parbox[t]{9pc}{\centering \textit{Volvereviria}, \textit{Floreoviria}, \textit{Orthornavirae},
\textit{Singelaviria}, \textit{Varidnaviria}}\vspace*{6pt} \\ 

\parbox[t]{9pc}{\centering Double jelly-roll capsid protein} & 
\parbox[t]{9pc}{\centering Double jelly-roll (2 tandem 8-stranded ${\upbeta}$-barrels)} & 
\parbox[t]{9pc}{\centering Capsid formation: major capsid protein} & 
\parbox[t]{9pc}{\centering Majority of \textit{Varidnaviria}}\vspace*{6pt} \\ 

\parbox[t]{9pc}{\centering HK97-like capsid protein} & HK97-fold & 
\parbox[t]{9pc}{\centering Capsid formation: major capsid protein}\vspace*{6pt} & 
\textit{Duplodnaviria} \\ 

Portal protein & 
\parbox[t]{9pc}{\centering Unique, mostly ${\upalpha}$-helical, ``portal'' fold} &
\parbox[t]{9pc}{\centering Capsid assembly and genome packaging}\vspace*{2pt} & 
\textit{Duplodnaviria}
\botline
\end{tabular}
\end{table*}

The VHGs define the distinct domains of the virosphere that correspond
to viral realms, the top rank in virus taxonomy \citep{53}. In accord
with the fundamental principles of evolutionary taxonomy, each realm is
supposed to be monophyletic, that is, all viruses in a realm are
assumed to share a common origin \citep{15,54}. In actuality, this
common origin can be limited to sharing only a few or even a single
VHG. Nevertheless, given the central roles of the VHGs in virus
replication, this approach is productive for reconstructing the
evolution of the virosphere and delineating higher taxa of viruses to
develop the taxonomic system that is informally known as megataxonomy
\citep{15,55}. By the conventions of taxonomy, any virus can belong to
only one realm although, when some viruses combine VHGs characteristic
of different realms, an apparent conflict emerges. Below we discuss
some such cases.

Initially, four major realms were established, \textit{Riboviria},
\textit{Monodnaviria}, \textit{Varidnaviria}, and
\textit{Duplodnaviria}, each including a vast diversity of viruses, and
two much smaller realms were added shortly thereafter. However,
subsequent in-depth evolutionary analysis led to the split of
\textit{Varidnaviria} into two realms, and a split of
\textit{Monodnaviria} into four realms, with the new realms approved by
ICTV. More splitting of realms based on a stricter approach to
monophyly can be expected, as discussed below. 

\subsection{Riboviria: the expanding viral RNA World}\label{ssec42}

The viruses in the realm \textit{Riboviria} share a single VHG, namely,
the gene encoding RNA-directed RNA polymerases (RdRPs) or RNA-directed
DNA polymerases (reverse transcriptases, RTs), the homologous enzymes
responsible for viral genome replication and expression
(Figure~\ref{fig2}A,B). This realm covers 5 Baltimore classes, including
all three classes of viruses with RNA genomes that replicate without a
DNA intermediate as well as viruses with RNA or DNA genomes whose
replication involves a reverse transcription stage. The study of the
large-scale evolution of this virus realm is, out of necessity, based
on comparison and phylogenetic analysis of RdRP/RT, the only VHG that
is conserved across the realm. Notably, no other genes come even close
to RdRP/RT with respect to the conservation among ribovirians (ending
``virians'' refers to all members of a realm \citep{55}): the most
common of these, such as the single jelly-roll (SJR) capsid proteins
(Figure~\ref{fig2}C), the principal building block of icosahedral
ribovirian virions, are shared only by subsets of the viruses in the
realm \citep{51}. Thus, comparative analysis of these genes can shed
light on the evolution of particular groups of ribovirians but not of
the entire realm. Indeed, virions of ribovirians are highly diverse
with respect to morphology and complexity, and are built from many
structurally unrelated major capsid proteins (Figure~\ref{fig2}C,D).
Thus, if virion morphogenesis proteins of ribovirians were chosen for
realm definition, over a dozen realms would have to be created for the
classification of all RNA and RT viruses. 

\begin{figure*}
\includegraphics{fig02}
\bcaption{\label{fig2}}{1}{Megataxonomy of viruses: realm \textit{Riboviria}.
(A) Taxonomic structure: from realm down to classes. The taxonomic tree
was retrieved from the ICTV website \citep{331}. (B) Structural models
and topology diagrams of the hallmark proteins of \textit{Riboviria},
RNA-dependent RNA polymerase (RdRP; PDB: 1ra7) and reverse
transcriptase (RT; PDB: 7o0h). (C,D) Virion architectures and
structural models of major capsid proteins (CP or CA) and nucleocapsid
(N) proteins in the kingdoms \textit{Othornavirae} (C) and
\textit{Pararnavirae} (D). The proteins were selected to represent the
diversity of structural folds in \textit{Riboviria}. Virion diagrams
were obtained from ViralZone \citep{332}. The structures are denoted
with the corresponding PDB accession numbers. Abbreviations: SJR CP,
single jelly-roll capsid protein; C, core protein; PsV-F, Penicillium
stoloniferum virus F; TYMV, turnip yellow mosaic virus; VEEV,
Venezuelan equine encephalitis virus; WNV, West Nile virus; SARS-CoV-2,
severe acute respiratory syndrome coronavirus 2; BDV, Borna disease
virus; TMV, tobacco mosaic virus; WMV, watermelon mosaic virus; RVFV,
Rift Valley fever virus; CCHFV, Crimean-Congo hemorrhagic fever virus;
HBV, hepatitis B virus; ACNDV, African cichlid nackednavirus; HIV-1,
human immunodeficiency virus 1; PFV, prototype foamy
virus.\vspace*{-.45pc}}
\end{figure*}

The realm \textit{Riboviria} is currently divided into two kingdoms,
\textit{Orthornavirae}, viruses that encode RdRP and have no DNA stage
in their replication cycles, and \textit{Pararnavirae}, viruses that
encode RTs and replicate via alternating RNA and DNA stages
(Figure~\ref{fig2}A). The two kingdoms share no conserved genes other
than RdRP and RT. All RdRPs and RTs share a common structural fold
(Figure~\ref{fig2}B), and their sequences can be confidently aligned to
produce a single phylogenetic tree that is not fully resolved but does
contain major, robust clades (see below). Moreover, among the
polymerases sharing a conserved core fold, known as the Palm domain or
RNA recognition motif (RRM), RdRPs and RTs are relatively close,
forming a single clade \citep{56}. Thus, beyond doubt, all RdRPs and
RTs evolved from a single ancestral enzyme that possessed polymerase
activity, either RdRP or RT, or both. However, this homology does not
actually mean that the two kingdoms share a common viral ancestry. All
known RdRP-encoding replicators are viruses or derivatives of viruses
that have lost genes for virion proteins (see below). Most likely, the
common ancestor of orthornaviraens (ending ``viraens'' refers to all
members of a kingdom \citep{55}) was a simple virus encoding the RdRP
and a SJR capsid protein. Orthornaviraens are ubiquitous in eukaryotes,
and as recently demonstrated by metatranscriptome analysis, are also
common in bacteria \citep{57,58,59,60}. Thus, the ancestral
orthornaviraens most likely infected bacterial hosts, having emerged at
an early stage of life evolution. Notably, despite considerable effort,
orthornaviraens and more generally ribovirians have not been thus far
detected in archaea, the causes of this potential exclusion remaining
\mbox{unclear}. 

\looseness=-1
Pararnaviraens have a completely different provenance, apparently,
descending from group~II introns, which are, in effect, prokaryotic
retrotransposons \citep{61,62}. Pararnaviraens evolved only in
eukaryotes as a result of the capture of the (nucleo)capsid proteins
\citep{63}. Thus, there is no evidence of common origin of
orthornaviraens and pararnaviraens from a viral ancestor. Furthermore,
there are strong indications that two orders within
\textit{Pararnavirae}, \textit{Ortervirales} and
\textit{Blubervirales}, have evolved from distinct families of
transposons by recruiting structurally unrelated major virion proteins
\citep{63,64,65}{\break} (Figure~\ref{fig2}D). Accordingly, following
the principle of viral realm monophyly, the current realm
\textit{Riboviria} should be split into at least three separate realms,
one including all orthornaviraens and two more for the two
independently evolved branches of \mbox{pararnaviraens}. 

Initial phylogenetic analysis of orthornaviral RdRPs produced a tree
with 5 major clades that were designated phyla \citep{51,66}. Three of
the 5 phyla consist of positive-sense RNA viruses (although one phylum,
\textit{Pisuviricota}, also includes some groups of dsRNA viruses), one
of negative-sense RNA viruses, and the last one of dsRNA viruses. The
monophyly of each phylum is strongly statistically supported but the
relationships among them remain largely uncertain \citep{57,58,59}.
However, the basal position of phylum \textit{Lenarviricota}, which
consists of riboviruses infecting bacteria and their direct descendants
infecting eukaryotes, is robust in RdRP trees rooted by RT, implying
that all the diverse orthornaviraens of eukaryotes evolved from
bacterial ancestors.

Metatranscriptome mining led to a manyfold increase in the known
diversity of orthornaviraens, including several branches without a
close relationship to any of the 5 large phyla that are candidates for
new phyla \citep{57,58,59,60}. Two of these that have already been approved
by the ICTV are of particular note. The new phylum
\textit{Artimaviricota} includes viruses with genomes consisting of two
segments of dsRNA, one of which encodes a distinct polymerase that, as
suggested by structural comparison, could be an evolutionary
intermediate between RdRP and RT \citep{67}. Artimaviricots (ending
``viricots'' refers to all members of a phylum \citep{55}) were
discovered in metatranscriptomes from hot springs dominated by
hyperthermophilic bacteria, which are the likely hosts of these
viruses. Prokaryotic hosts are suggested also by the genome
organization of artimaviricots, with genomic segments encompassing
multiple protein-coding genes preceded by ribosome-binding sites (RBS
or Shine-Dalgarno sequences). 

More generally, perhaps, the greatest insight from the
metatranscriptome mining for ribovirians is the dramatic expansion of
the number and diversity of orthornaviraens associated with bacterial
hosts, even if in many cases tentative. Apart from the discovery of an
unexpected plentitude of previously unknown lenarviricots,
multicistronic genome organization, with the telltale RBS preceding
each protein-coding region, in addition to artimaviricots, was
discovered in picobirnaviruses, paraxenoviruses and several branches of
partitiviruses, all viruses with dsRNA genomes in phylum
\textit{Pisuviricota} \citep{57,68,69,70}. Furthermore, it has been shown
that many picobirnaviruses, initially thought to infect eukaryotes,
encode functional bacterial lysins, strongly supporting the bacterial
host for these viruses \citep{71}. In addition, several deep branches,
representatives of putative new phyla, showed the same features
suggestive of prokaryotic hosts. Thus, metatranscrtiptomics shattered
the traditional view that riboviruses were primarily characteristic of
eukaryotes, showing that RNA bacteriophages are a substantial, diverse
component of prokaryotic viromes that has been largely overlooked
before large metatranscriptome sequence datasets became available.
Moreover, the mixing of viruses with known eukaryotic hosts with those
predicted to infect bacteria among the partitiviruses suggests that, in
some groups of orthornaviraens, particularly those that, like
partitiviruses, harbor two or more genomic segments, each carrying a
single gene, the switch from \mbox{prokaryotic} to eukaryotic hosts or
vice versa is comparatively easy and occurred on multiple occasions
\citep{57,67}. The remarkable ability of partitiviruses to replicate in
highly diverse hosts has been demonstrated experimentally in
eukaryotes, where the same virus has been shown to replicate in hosts
from three eukaryotic kingdoms, including fungi, plants and animals
\citep{72}. How such simple viruses can overcome the physical barriers
of the host envelope and defense systems in highly distinct cell types
is an{\break} intriguing question.

Perhaps, an even more startling discovery is the 7{th} recognized
phylum of orthornaviraens, \textit{Ambiviricota}, which includes fungal
parasites with covalently closed circular (ccc) RNA genomes encoding a
distinct RdRP without clear affinity with any other viruses and an
uncharacterized protein \citep{73}. Ambivirus genomes resemble those of
viroids in that both are circular and contain ribozymes, RNA segments
with distinct secondary structures endowed with catalytic activity
\citep{74,75}. The mechanistic interplay between the ribozymes and
RdRP during the ambiviral replication cycle remains to be elucidated.
Ambiviricots combine features of two virus realms, \textit{Riboviria}
and \textit{Ribozyviria} (see below), but given the presence of the
RdRP, the hallmark of \textit{Orthornavirae} within \textit{Riboviria},
the realm and kingdom are not in doubt. 

Mapping the gene content and genome organization of orthornaviraens to
the RdRP tree allowed an approximate reconstruction of evolutionary
trajectories, revealing the major trend of genome growth and
complexification via accretion of genes captured from hosts and other
viruses. The ancestral orthornaviraen is inferred to have had a small
genome of perhaps about 3~kb, with two genes only, encoding the RdRP
and a capsid protein, most likely, with the SJR fold \citep{51}.
Subsequent evolution involved a substantial increase in genome size
occurring independently in different lineages, up to 64~kb in some
nidoviruses \citep{76,77,78}. Acquisitions that occurred in parallel in
different lines of descent included RNA helicases that apparently
enabled efficient replication of larger RNA genomes, proteases that
cleave viral polyproteins into mature, functional proteins and a
variety of genes involved in virus--host interaction, particularly
inhibition of host defenses and movement proteins in plant viruses. 

Pararnaviraens are far less diverse than orthornaviraens, with only one
phylum and class, which splits into two orders, \textit{Ortervirales}
and \textit{Blubervirales} \citep{63} (Figure~\ref{fig2}A).
Ortervirals (ending ``virals'' refers to all members of an order
\citep{55}) include the numerous animal retroviruses with RNA genomes
replicating via a DNA intermediate along with plant caulimovirids
(ending ``virids'' refers to all members of a family \citepalias{55}) with
dsDNA genomes replicating via an RNA intermediate and three families of
viruses that are often considered as ``long terminal repeat (LTR)
retrotransposons'', namely, \textit{Metaviridae},
\textit{Pseudoviridae} and \textit{Belpaoviridae}. Members of the three
families share with the animal retrovirids homologous RTs, aspartic
proteases and a set of structural proteins, namely, capsid and
nucleocapsid proteins, that form virus particles \citep{79}
(Figure~\ref{fig2}D). Thus, according to the virus definition given
above, ``LTR retrotransposons'' are {bona fide} viruses.
Recently, a fourth potential family of ortervirals, described as Troyka
``retrotransposons'', has been reported \citep{80}, and additional
divergent lineages of RT viruses are likely to be discovered in the
near future.

\textit{Blubervirales} includes human hepatitis B virus and its
relatives infecting other vertebrate animals as well as a diverse group
of related non-enveloped viruses informally known as nackednaviruses
(Figure~\ref{fig2}D). More recently, the reach of blubervirals was
extended to rotifers, a group of microscopic and near-microscopic
pseudocoelomate animals, by a family-level lineage, dubbed
proto-nackednaviruses, which in the RT phylogeny form a basal group to
hepadnavirids and nackednaviruses \citep{65}. Blubervirals possess DNA
genomes that replicate via an RNA intermediate and their structural
proteins are unrelated to those of ortervirals \citep{81}
(Figure~\ref{fig2}D). 

Pararnaviraens are limited in their spread to eukaryotic hosts and
apparently evolved from non-viral retroelements, such as group~II
self-splicing introns and retrotransposons, by acquiring host proteins
and exapting them for structural roles in the virions \citep{26}.
Furthermore, the two orders within this kingdom apparently evolved at
two independent points of origin from different retrotransposon
families \citep{64,69,82}. Whereas the exact ancestor of ortervirals
remains obscure, blubervirals have apparently evolved from a distinct
group of retroelements known as \textit{HEART} \citep{64}.

\subsection{Four realms of viruses with ssDNA genomes}\label{ssec43}

Until recently, the vast majority of viruses with ssDNA genomes and two
groups of viruses with small dsDNA genomes (papillomavirids and
polyomavirids) were classified in a single realm,
\textit{Monodnaviria}. The realm was held together by a single VHG that
encodes a distinct endonuclease of the HUH superfamily (or its
inactivated derivative) involved in the initiation of the genome
replication via the rolling circle (or rolling hairpin) mechanism
\citep{83,84}. However, recently, upon considerable expansion of the
ssDNA virome, largely through metagenomics, and improved understanding
of evolutionary relationships among these viruses, the realm was split
into four monophyletic realms, corresponding to the four initially
defined kingdoms (Figure~\ref{fig3}A). 

\begin{figure*}
\includegraphics{fig03}
\bcaption{\label{fig3}}{2}{Megataxonomy of viruses: the four realms of
viruses with small ssDNA or dsDNA genomes. (A) Taxonomic structure,
from realms down to orders, and structural models of the realm-specific
hallmark proteins: major capsid proteins (\textit{Efunaviria} and
\textit{Volvereviria}), rolling circle replication initiator containing
the HUH superfamily endonuclease and superfamily 3 helicase domains
(\textit{Floreoviria}), and membrane fusion protein VP5
(\textit{Pleomoviria}). The structures are denoted with the
corresponding PDB accession numbers. AF3 denotes structures modeled
with AlphaFold3. The taxonomic tree was retrieved from the ICTV website
\citep{331} and modified. (B) The three distinct replication mechanisms
catalyzed by unrelated proteins and yielding ssDNA genomes, which are
encapsidated into virus particles. Note that some members of the realm
\textit{Pleomoviria} encapsidate either dsDNA or ssDNA replicative
intermediates. Also shown is the theta mode of genome replication
initiated by the inactivated HUH endonuclease---superfamily 3 helicase
of polyomavirids and papillomavirids. Abbreviations: BmBDV, Bombyx mori
bidensovirus; TP, terminal protein; pPolB, protein-primed family B DNA
polymerase; TAg, large T antigen; SV40, simian virus 40; ori, origin of
replication; TIR, terminal inverted repeats; Tnp, transposase; PCV2,
porcine circovirus 2; MCP, major capsid protein; HRPV-6, Halorubrum
pleomorphic virus 6. Virion diagrams were obtained from ViralZone
\citep{332}.\looseness=-1\vspace*{-.15pc}}
\end{figure*}

Realm \textit{Efunaviria}, with the kingdom \textit{Loebvirae},
consists of filamentous prokaryotic viruses with circular ssDNA
genomes, including some well-characterized phages of family
\textit{Inoviridae} such as M13, f1 and fd \citep{85}. The
morphogenetic module and the overall virion assembly mechanism of
efunavirians are conserved. All efunavirians encode a highly
hydrophobic capsid protein consisting of a single ${\upalpha}$-helix,
which coats the circular ssDNA genome upon its extrusion through the
cytoplasmic membrane with the aid of the conserved virus-encoded
FtsK-family ATPase \citep{86} (Figure~\ref{fig3}A). By contrast, the
genome replication enzymes vary. Although some members of the realm
employ HUH endonucleases, the majority relies on a variety of rolling
circle replication initiation endonucleases of the Rep\_trans
superfamily that is not homologous to the HUH superfamily \citep{85},
whereas others use transposases that generate ssDNA molecules as a
transposition intermediate \citep{87} (Figure~\ref{fig3}B).
Efunavirians infect widely diverse bacterial hosts, suggesting
long-term association. However, putative efunavirians were also
discovered in some methanogenic archaea \citep{85}, suggesting that the
emergence of this virus realm could be more ancient, potentially dating
back to the LUCA \citep{88}. 

Realm \textit{Volvereviria}, with the kingdom \textit{Sangervirae},
represents a relatively uniform group of prokaryotic viruses,
exemplified by the iconic bacteriophage phiX174 and other phages of the
recently created class \textit{Microviricetes} (formerly, family
\textit{Microviridae}). Volverevirians form small icosahedral capsids
built of a distinct SJR-fold capsid protein \citep{89,90,91,92}
(Figure~\ref{fig3}A), which is commonly used as a marker for the
discovery and classification of volverivirians \citep{93,94,95,96}.
These viruses have circular ssDNA genomes replicated by the rolling
circle mechanism and uniformly (thus far) encode the HUH endonuclease
\citep{97,98} (Figure~\ref{fig3}B). Volverevirians can be either
strictly lytic, typically lysing the host by inhibiting peptidoglycan
synthesis \citep{99}, or temperate, integrating into the bacterial
genome using the host recombination machinery \citep{100,101}. All
experimentally characterized members of this realm are phages, but for
the majority of volverevirians identified via metagenomics, the hosts
remain unknown, so it cannot be excluded that some of them replicate in
archaea. 

\looseness=-1
Realm \textit{Floreoviria}, with the kingdom \textit{Shotokuvirae},
encompasses a highly diverse group of eukaryotic viruses with small
ssDNA or dsDNA genomes, which in most members are circular, but can
also be linear (e.g., families \textit{Parvoviridae},
\textit{Bidnaviridae}, \textit{Oomyviridae}). Floreovirians form
icosahedral capsids from SJR capsid proteins \citep{102} which,
however, appear to have a distinct origin from the SJR capsid proteins
of volverevirians \citep{26}. Furthermore, in some lineages (e.g.,
\textit{Bacilladnaviridae}, \textit{Naryaviridae},
\textit{Nenyaviridae}, and cruciviruses), the ancestral SJR capsid
protein has been replaced by distinct SJR capsid protein variants from
RNA viruses, allowing for the formation of larger ($T = 3$) capsids
\citep{103,104,105,106,107,108}. A characteristic feature of viruses in
this realm is the two-domain replication protein, consisting of the
N-terminal HUH endonuclease and C-terminal superfamily 3 helicase
domains \citep{109} (Figure~\ref{fig3}A). The ancestor of
floreovirians, in all likelihood, encoded the two-domain HUH
superfamily endonuclease and replicated via the rolling circle
mechanism, which is employed by the majority of the members of this
realm (Figure~\ref{fig3}B). However, in some descendants, this gene has
been either inactivated (polyomavirids and papillomavirids),
concomitant with a switch from the rolling circle to the theta-like
replication mechanism \citep{110}, or replaced by a protein-primed
family B DNA polymerase (bidnavirids) \citep{111} (Figure~\ref{fig3}B).
An even more dramatic departure from the canonical gene combination is
observed in members of the family \textit{Anelloviridae} (phylum
\textit{Commensaviricota}, order \textit{Sanitavirales}). Anellovirids
retain the SJR capsid protein, albeit with major modifications not
observed in other floreovirians \citep{112,113,114}, but the ancestral
replication initiation gene has been lost, with the genome replication
apparently fully depending on the host replication machinery
\citep{115}. Although the exact mechanism remains to be{\break}
\mbox{elucidated}, it has been suggested that anellovirids employ
recombination-dependent replication by recruiting the host DNA
polymerase alpha and BTR (Bloom's syndrome helicase (BLM),
topoisomerase III${\upalpha}$, RMI1, and RMI2) complexes \citepalias{115},
with the circular ssDNA genomes being produced by a process that
\mbox{resembles} the formation of extratelomeric C-circles
\citep{116,117}.

Finally, the realm \textit{Pleomoviria}, with the kingdom
\textit{Trapavirae}, includes several families of archaeal viruses with
enveloped pleomorphic virions (Figure~\ref{fig3}A). The virions
resemble membrane vesicles with two major structural proteins, a spike
protein with a unique structural fold responsible for host recognition
and membrane fusion, and a matrix protein embedded in the viral
envelope \citep{118,119,120}. The membrane fusion protein is a
signature of pleomovirians (Figure~\ref{fig3}A). Initially,
\textit{Trapavirae} included a single family, \textit{Pleolipoviridae},
of viruses infecting extremely halophilic archaea \citep{119,121}.
However, more recently, related viruses were also identified in
hyperthermophilic, methanogenic and nano-sized hyperhalophilic archaea
\citep{122,123,124}, expanding the genetic diversity and host range of
this virus group. The genomes of pleomovirians can be ssDNA or dsDNA,
circular or linear. Accordingly, the replication mechanisms and the
corresponding enzymes are highly variable, including several
non-orthologous and even non-homologous rolling circle replication
endonucleases \citep{109, 123,125,126}, primases \citep{121} or
protein-primed family B DNA polymerases \citep{119}
(Figure~\ref{fig3}B). Thus, it is the shared morphogenetic rather than
genome replication module that holds this realm together.

Viruses of the four realms, \textit{Efunaviria}, \textit{Volvereviria},
\textit{Floreoviria} and \textit{Pleomoviria}, formerly joined under
\textit{Monodnaviria} typically have small genomes (2--10 kb;
Figure~\ref{fig1}C) and their evolution appears to be tightly
intertwined with that of small plasmids, largely replicating via the
rolling circle mechanism, and apparently, ancestors of the four realms
evolved from different families of such plasmids \citep{109}. In all
cases of independent origins of viruses with small ssDNA genomes, the
capsid protein genes were captured by the emerging viruses
independently from other viruses, such as orthornaviraens, or from host
genes. 

\subsection{Realms Varidnaviria and Singelaviria: the
epitome of viral diversity}\label{ssec44}

Realm \textit{Varidnaviria} consists of an enormous diversity of dsDNA
viruses infecting bacteria, archaea, and eukaryotes (Figure~\ref{fig4}A)
that typically have an icosahedral capsid built of double jelly-roll
(DJR) major capsid protein (MCP), the principal VHG that holds the
realm together \citep{127}, and SJR minor capsid protein (penton),
which form hexagonal and pentagonal capsomers, respectively \citep{128}
(Figure~\ref{fig4}B,C). Most varidnavirians also share another VHG
encoding a genome packaging ATPase of the FtsK-HerA superfamily
\citep{129}. The varidnavirians display a remarkable variation in
virion sizes, morphologies and complexity. Most icosahedral virions in
this realm contain an internal lipid membrane, sandwiched between the
protein capsid and the genome, which likely represents the ancestral
trait of \textit{Varidnaviria} (Figure~\ref{fig4}D). However, in some
lineages (e.g., adenovirids), the internal membrane was lost, whereas
in others, an additional external membrane was added (e.g.,
iridovirids), and/or a second internal icosahedral shell built from
unrelated capsid proteins has been added (e.g., asfarvirids and
faustoviruses) (Figure~\ref{fig4}D). Even more dramatic deviations from
the canonical structural layout occurred in several groups of
varidnavirians, e.g., pandoraviruses and pithoviruses, where the DJR
MCP was replaced by unrelated proteins, resulting in odd capsid shapes
\citep{130,131,132,133,134} (Figure~\ref{fig4}D). Other groups of
viruses within \textit{Varidnaviria}, such as poxvirids, encode a
homolog of the DJR MCP but incorporate it only into intermediates of
the virion morphogenesis, whereas the odd-shaped mature capsids consist
of unrelated viral proteins \citep{135}. 

\begin{figure*}
\includegraphics{fig04}
{\vspace*{.25pc}}
\caption{\label{fig4}Megataxonomy of viruses: realm
\textit{Varidnaviria}. (A) Taxonomic structure, from realms down to
orders. The taxonomic tree was retrieved from the ICTV website
\citep{331}. (B) Structural models of the double jelly-roll major
capsid protein (DJR-MCP; top) and single jelly-roll penton protein
(bottom) of bacteriophage PRD1 (\textit{Tectiviridae}). The structures
are denoted with the corresponding PDB accession numbers. (C)
Structural models of the MCP trimer and penton pentamer, which form
hexagonal and pentagonal capsomers. In tectivirids and adenovirids, the
two types of capsomers are arrayed to form icosahedral capsids with a
pseudo $T = 25$. (D) Diversity of virion organizations and complexity
among varidnavirians and proposed evolutionary scenarios. Virion
diagrams were obtained from ViralZone \citep{332}.}
\end{figure*}

\textit{Varidnaviria} consists of two kingdoms, \textit{Bamfordvirae}
and \textit{Abadenavirae} \citep{136}. The kingdom
\textit{Abadenavirae} includes all bacterial and archaeal viruses with
DJR MCP and relatively small dsDNA or ssDNA genomes (e.g.,
finnlakevirids), except for one family of bacterial viruses,
\textit{Tectiviridae}. The kingdom \textit{Bamfordvirae} includes
tectivirids and a broad diversity of eukaryotic viruses, characterized
by highly diverse genome and virion sizes, including megagenomoviruses
with genomes in excess of 4~Mb \citep{137} (Figure~\ref{fig4}A). 

The evolutionary scenario for \textit{Varidnaviria} derives the entire,
striking diversity of eukaryotic varidnavirians from a tectivirid
ancestor (Figure~\ref{fig4}D), possibly, passed to the emerging
eukaryote by the mitochondrial endosymbiont \citep{138,139}. Notably,
members of a diverse assemblage of bamfordviraens with relatively small
genomes (15--35 kb) were originally recognized as DNA transposons known
as polintons (after Polymerase and Integrase, two proteins encoded in
their genomes; alternatively, called mavericks) that are integrated in
genomes of a broad variety of eukaryotes, in some cases, in numerous
copies \citep{140,141,142,143}. Subsequent analyses have shown, however, that
polintons encode a complete morphogenetic module comprising typical
DJR-MCP, penton protein with the SJR fold, capsid maturation protease
and the packaging ATPase, strongly suggesting that polintons are bona
fide varidnavirians \citep{144}. Currently, polintons and numerous
polinton-like viruses, of which many but not all encode integrases and
are known or predicted to integrate into the host genomes, are
classified into subphylum \textit{Polisuviricotina} of phylum
\textit{Preplasmiviricota} that also includes the tectivirids
\citep{136}. Many polisuviricotins are virophages, that is, symbionts
(parasites or commensals) of large viruses of phylum
\textit{Nucleocytoviricota} \citep{145,146,147,Roitmanetal2023}. 

\begin{figure*}
\includegraphics{fig05}
\caption{\label{fig5}Megataxonomy of viruses: realm
\textit{Singelaviria}. (A) Taxonomic structure, from realms down to
families. Family \textit{Portogloboviridae} is connected to the realm
by a dashed line because it is currently not formally included in
\textit{Singelaviria}, but is likely to represent the ancestral state
of the realm. (B) Structural models of the single jelly-roll fold major
capsid proteins (MCP) and their oligomers. Some members of the realm,
e.g., Sulfolobus polyhedral virus 1 (SPV1; \textit{Portogloboviridae}),
encode a single MCP VP4 (red) which forms homohexameric capsomers,
whereas others, e.g., Haloarcula hispanica icosahedral virus 2 (HHIV-2;
\textit{Sphaerolipoviridae}), encode two paralogous MCPs, VP7 (red) and
VP4 (blue), which form two types of heterohexameric capsomers, hexamers
1 and 2, respectively. The structures are denoted with the
corresponding PDB accession numbers.}
{\vspace*{-.15pc}}
\end{figure*}

Similar to viruses with smaller DNA genomes, varidnavirians employ a
variety of genome replication strategies and, accordingly, encode a
range of non-homologous replication proteins, ranging from Rep\_trans
and HUH endonucleases for rolling circle replication (e.g.,
finnlakevirids and corticovirids, respectively \citep{148,149}) to
protein-primed family B DNA \mbox{polymerases} and their various
derivatives (e.g., tectivirids and adenovirids \citep{139}) to family A
DNA polymerases (e.g., sputnivirovirids \citep{150}) to the complete or
near-complete replisome in most \mbox{nucleocytoviricots}
\citep{22,151}. Notably, major variations are observed within
\textit{Nucleocytoviricota}: members of the class
\textit{Mriyaviricetes}, the nucleocytoviricots with the smallest
genomes that apparently comprise the basal branch of this phylum,
encode an HUH endonuclease but no DNA polymerase and are predicted to
replicate via the rolling circle mechanism \citep{152}. Even relatively
closely related viruses (e.g., those from the same order) may encode
unrelated replication modules. Furthermore, the presence of the
conserved HUH endonuclease of \textit{Mriyaviricetes} formally connects
this viral class to two of the realms of viruses with ssDNA genome, for
which this protein is a hallmark. However, the overall phylogenomic
analysis leaves no doubt about the assignment of mriyaviricetes to
\textit{Nucleocytoviricota}. 

Realm \textit{Singelaviria} includes bacterial and archaeal viruses
with linear or circular dsDNA genomes and icosahedral capsids built
from SJR MCPs (Figure~\ref{fig5}A,B). All structurally characterized,
formally classified singelavirians encode two paralogous SJR MCPs,
which form two types of heterohexameric capsomers (Figure~\ref{fig5}B),
resembling the homotrimeric capsomers of varidnavirians
(Figure~\ref{fig4}C). Furthermore, similar to most varidnavirians, the
icosahedral virions of singelavirians contain an internal membrane
\citep{153}. Thus, initially, it has been assumed that fusion of the
two singelavirian SJR MCP genes yielded the DJR MCP
\citep{154,155,156}. This scenario was further supported by the fact
that, like varidnavirians, singelavirians encode an FtsK-HerA
superfamily genome packaging ATPase \citep{157}. Accordingly,
singelavirians were included in the realm \textit{Varidnaviria} as
kingdom \textit{Helvetiavirae}. However, more recently, a detailed
comparison of protein structures showed that MCPs of singelavirians and
varidnavirians evolved from distinct cellular proteins, an SJR
carbohydrate-binding protein and DJR glycoside hydrolase, respectively
\citep{158}. Accordingly, kingdom \textit{Helvetiavirae} has been
reclassified as a separate realm.\looseness=-1

Further exploration of virus diversity led to the identification of
relatively close singelavirian relatives that encode the FtsK-HerA
genome packaging ATPase and a single SJR MCP (e.g., halicovirids),
implying homohexameric capsomers \citep{124,159}. Finally, archaeal
viruses of the \textit{Portogloboviridae} family \citep{160,161},
currently not formally assigned to \textit{Singelaviria}
(Figure~\ref{fig5}A), might even more closely resemble the ancestral
state. Portoglobovirids lack the genome packaging ATPase and encode a
single SJR MCP, which closely resembles MCPs of \mbox{singelavirians}
and forms homohexamers (Figure~\ref{fig5}B), as predicted for
halicovirids \citep{162}. 

\begin{figure*}
\includegraphics{fig06}
{\vspace*{.2pc}}
\caption{\label{fig6}Megataxonomy of viruses: realm
\textit{Duplodnaviria}. (A) Taxonomic structure: from realm down to
orders. The taxonomic tree was retrieved from the ICTV website
\citep{331} and modified to include the proposed phylum
``\textit{Mirusviricota}''. (B)~Structural models of the hallmark
proteins comprising the morphogenetic module conserved in
duplodnavirians, exemplified by the major capsid protein (MCP), large
subunit of the terminase (TerL), portal protein, and capsid maturation
protease of bacteriophage HK97.  (C) Virion morphogenesis in the three
phyla of duplodnavirians. The assembly pathway was depicted using the
modified virion diagrams obtained from ViralZone \citep{332}.}
\end{figure*}

\subsection{Realm Duplodnaviria, the champions of the
viral world}\label{ssec45}

Realm \textit{Duplodnaviria} includes the most abundant viruses on
Earth, namely, the tailed dsDNA \mbox{bacterial} and archaeal viruses (class
\textit{Caudoviricetes}) \citep{163,164,165}, the recently discovered
``mirusviruses'' associated with unicellular eukaryotic hosts (putative
new phylum ``\textit{Mirusviricota}'') \citep{166}, and animal viruses
of the order \textit{Herpesvirales} \citep{167} (Figure~\ref{fig6}A).
All these viruses share a distinct morphogenetic module that consists
of four VHGs encoding a HK97-fold MCP, a genome packaging
ATPase-nuclease (known as large terminase subunit) that is distinct
from the functional counterpart in varidnavirians, a portal protein,
and a distinct capsid maturation protease \citep{49}
(Figure~\ref{fig6}B). The four VHGs play key roles in capsid assembly,
maturation and genome packaging (Figure~\ref{fig6}C) and are highly
conserved throughout \textit{Duplodnaviria}. The assembly pathways
following the genome packaging differ in a host-dependent manner. In
prokaryote-infecting \textit{Caudoviricetes}, the packaged capsid is
combined with a preassembled tail \citep{163,168}, whereas in
eukaryote-infecting herpesvirals and mirusviruses, the capsid is
enveloped with a lipid membrane \citep{169,170} (Figure~\ref{fig6}C). 

Realm \textit{Duplodnaviria} has a simple taxonomic structure including
a single kingdom and three phyla (Figure~\ref{fig6}A),
\textit{Uroviricota} (caudoviricetes), \textit{Peploviricota}
(herpesvirals) and the yet to be formally recognized
``\textit{Mirusviricota}'' (mirusviricots). \mbox{Notwithstanding} this
uniformity at the top ranks of the taxonomy, the diversity of
uroviricots is enormous, resulting in a constant flux of taxonomy
reorganization \citep{165,171,172}. 

The range of genome size and complexity of duplodnavirians nearly
matches that of varidnavirians, spanning from about 10~kb to about
750~kb, all~within \textit{Uroviricota} \citep{173,174}.
Until the recent discovery of ``\textit{Mirusviricota}'', the
evolutionary history of this realm appeared enigmatic, with a broad
gulf separating tailed viruses of prokaryotes (bacteria and archaea)
from herpesviruses that have been only identified in animals. However,
``\textit{Mirusviricota}'', a vast phylum of viruses apparently
associated with a broad variety of unicellular eukaryotic hosts, fill
this gap, in all likelihood, representing the original diversity of
duplodnavirians in eukaryotes and the ancestors of herpesvirals
\citep{166,175}. 

As in the case of other virus groups discussed above, duplodnavirians
display a remarkable diversity of genome replication strategies,
especially among bacterial and archaeal caudoviricetes (ending
``viricetes'' refers to all members of a class \citep{55}), which
employ all conceivable DNA replication strategies and encode suites of
replication factors ranging from single-protein initiators to complete
replisomes \citep{176,177,178,179}. Notable variations also exist in
the replication modules of eukaryotic duplodnavirians. For instance, in
the order \textit{Herpesvirales}, family B DNA polymerases of
orthoherpesvirids are not orthologous to those of malacoherpesvirids
and alloherpesvirids \citep{180}, whereas many mirusviricots lack DNA
polymerases altogether \citep{175}. 

\subsection{Small realms Adnaviria and Ribozyviria}\label{ssec46}

Realm \textit{Adnaviria} consists of archaeal viruses with rigid or
flexible filamentous virions that can be either non-enveloped or
contain an external lipid membrane \citep{181,182}
(Figure~\ref{fig7}A). The \mbox{remarkable} feature of these viruses is that
their \mbox{linear} dsDNA genome is stored in the virions in the A
conformation \citep{168,183}. Adnavirians lack conserved VHGs although
members of some families do encode them, e.g., HUH endonuclease that is
conserved in rudivirids \citep{184}. However, among themselves,
adnavirians share several conserved genes, most notably, the MCP with a
unique ${\upalpha}$-helical fold \citep{185,186,187,188}, first described in the
rudivirid SIRV2 \citep{183}. Although some adnavirians encode a single
MCP, many encode two paralogous MCPs; these bind dsDNA as homodimers
and heterodimers, respectively (Figure~\ref{fig7}B). Adnavirians were
initially detected in hyperthermophilic archaea of the phylum
\textit{Thermoproteota}, but subsequent metagenomics surveys showed
that these viruses are also associated with hosts from the other
archaeal phyla \citep{189,190}.

\begin{figure*}
\includegraphics{fig07}
{\vspace*{.15pc}}
\caption{\label{fig7}Megataxonomy of viruses: three small realms. (A)
\textit{Adnaviria}, taxonomic structure from realm down to families.
(B)~Virion architecture and structural model of the homodimeric and
heterodimeric major capsid proteins of adnavirians, Saccharolobus
islandicus rod-shaped virus 2 (SIRV2) and Acidianus filamentous virus 1
(AFV1), respectively. The structures are denoted with the corresponding
PDB accession numbers. (C) \textit{Ribozyviria}, taxonomic structure
from realm down to families. (D)~Virion architecture and structural
model of the nucleocapsid protein (Delta antigen) octamer of the
prototype ribozyvirian, hepatitis delta virus. The structural model was
generated using AlphaFold3 (AF3). S-DAg and L-DAg, small and large
delta antigens. S, M, L: hepatitis B virus glycoproteins. (E)
``\textit{Telodnaviria}'', taxonomic structure from realm down to
families. ``\textit{Telodnaviria}'' has not yet been formally
recognized by the ICTV.  (F) Virion architecture and structural model
of the major capsid protein dimer of a well-characterized
telodnavirian, Autographa californica multiple nucleopolyhedrosis virus
(AcMNPV). The PDB accession number is indicated. AmFV, Apis mellifera
filamentous virus.}
{\vspace*{.15pc}}
\end{figure*}

The small realm \textit{Ribozyviria} \citep{191,192} currently includes
a single family, \textit{Kolmioviridae} \citep{193}, which encompasses
human hepatitis D viruses 1--8 (genus \textit{Deltavirus}) and their
relatives discovered in other animals \citep{194,195,196,197,198}
(Figure~\ref{fig7}C). Ribozyvirians are viroid-like covalently closed
circular (ccc) RNA replicons encoding a nucleocapsid protein (delta
antigen, DAg). Similar to viroids, these viruses hijack the cellular
transcription machinery for their genome replication and depend on
other viruses (\mbox{bluberviral} hepatitis B virus in the case of
deltaviruses, arenavirids in case of daletviruses, unknown for most of
the others) for the formation of infectious enveloped virions
\citep{199,200}. DAg adopts a unique ${\upalpha}$-helical fold and forms
octamers (Figure~\ref{fig7}D) \citep{201}. The viral pseudo-dsRNA is
thought to wrap around the DAg octamers that mimic nucleosomes and
facilitate the recruitment of the host RNA polymerase~II and the
associated chromatin remodeling complexes for viral RNA replication
\citep{202,203}. DAg also induces packaging of the kolmiovirid
ribonucleoprotein complex into progeny virions. Recent evidence
suggests that this complex is directly incorporated into the enveloped
virions of their helper viruses \citep{204}. No autonomous
ribozyvirians have been described to date, suggesting that all members
of this realm are satellites of other viruses. Metagenome mining
led to the discovery of a substantial variety of ribozy-like cccRNAs
encoding distant homologs of the deltavirus nucleocapsid protein and
most likely replicating in unicellular eukaryotes \citep{58,74}. Thus,
although \textit{Ribozyviria} will probably remain a small viral realm,
its true diversity appears to be much greater than reflected in the
current taxonomy. 

\subsection{Are there more viral realms to be established?}
\label{ssec47}

The 10 realms described above encompass ${\sim}$95\% of the currently
recognized virus families \citep{17}. What about the remaining 5\% that
are currently not assigned to any realm? It is highly likely that some
of the unassigned families will be unified into additional realms. For
instance, viruses with large dsDNA genomes currently classified into
the class \textit{Naldaviricetes} \citep{205,206} were suggested to
represent a separate realm, ``\textit{Telodnaviria}'' \citep{207}
(Figure~\ref{fig7}E). Viruses of this group, of which the best-known
ones are baculoviruses due to their wide use as biocontrol agents in
agriculture and as tools in molecular biology \citep{208}, infect
diverse arthropods, including insects and crustaceans, and form complex
helical virions. The baculovirus MCP has a unique fold, unrelated to
those of viruses from the other realms \citep{207,209}
(Figure~\ref{fig7}F). Furthermore, members of the
\textit{Naldaviricetes} share a core gene set, including genes involved
in genome replication and transcription as well as {per os}
infectivity factors (PIF) \citep{205}, which are believed to form a
distinct attachment and membrane fusion complex \citep{207}. 

Another realm is likely to be created for archaeal viruses with dsDNA
genomes and spindle-shaped or lemon-shaped virions. Spindle-shaped
viruses are ubiquitous in archaea, associated with phylogenetically
diverse hosts \citep{124,210}. Due to their broad distribution in
archaea, spindle-shaped viruses are thought to have been coevolving and
diversifying with archaea ever since the emergence of the last archaeal
common ancestor from the LUCA \citep{88}. Spindle-shaped viruses encode
a distinct MCP which adopts a simple ${\upalpha}$-helical hairpin fold
\citep{211,212}. The highly hydrophobic MCP subunits polymerize into a
helical assembly, which has a larger diameter in the central region
where the genome is located \citep{211}. Spindle-shaped viruses are
highly diverse in terms of gene content and genome replication
strategies. Some have linear genomes replicated by virus-encoded
protein-primed family B DNA polymerase \citep{190,213,214,215}, whereas
others have circular genomes and encode either HUH superfamily
endonucleases \citep{124,215} or homologs of various host replisome
components, such as RNA-primed family B DNA polymerase \citep{189} or
MCM helicase \citep{216}, or lack recognizable replication proteins.
Due to their unique architecture and the fact that spindle-shaped
viruses do not share conserved VHGs with other virus realms, they are
likely to have an independent origin and thus qualify as a separate
realm.

Several additional groups of archaeal viruses with enigmatic gene
contents and unusual virion morphologies, such as bottle-shaped,
droplet-shaped or globular viruses \citep{168}, and bacterial viruses
of the family \textit{Plasmaviridae} \citep{217}, could represent
additional realms. A variety of protein-coding cccRNAs have been
discovered in metagenomes, and some of these could be independently
originated viruses that would qualify as separate realms \citep{74}.
However, the available information on the virion organization,
diversity and evolution of all these (putative) viruses remains scarce
and thus their higher-level taxonomic assignment appears premature.

Can we expect to discover viruses representing additional new realms?
It is likely that all major, widespread virus realms have been already
sampled, a monumental achievement from a century of culture-dependent
and more recent, culture-independent virus discovery efforts. However,
it is hard to put an upper bound on the number of independently
evolving smaller virus groups restricted to particular hosts or
environments. We predict that sampling of the less explored host
groups, such as archaea or protists, as well as extreme environments,
will continue to expose previously unseen viruses, some of which will
represent new realms. Classical, culture-dependent virus isolation will
play a key role in this endeavor, whereas metagenomics and
metatranscriptomics will be instrumental both in the preceding
discovery phase, to identify candidates for new realms, and in the
subsequent stage, for determining the extent of diversity and
distribution of these new virus groups.

\section{The key trends and processes of virus{\hfill\break} evolution}
\label{sec5}

\subsection{Pronounced modularity of viruses}\label{ssec51}

Phylogenomic analysis of the rapidly growing collection of viral
genomes allows us to infer some general trends and patterns of virus
evolution \citep{52}. Any viral genome, with the exception of some
satellite viruses and some derived viruses that have lost structural
proteins, combines at least two functional modules, those encoding
proteins involved in virion morphogenesis and those involved in genome
replication and expression. Each of these modules can be represented by
only a single gene (or even a ribozyme as in ribozyvirians) or can be
highly complex, including numerous genes. For instance, many viruses of
the realm \textit{Floreoviria}, with small ssDNA genomes, e.g.,
circovirids, encode one capsid protein and one replication-associated
protein (Rep) \citep{218}, whereas large dsDNA viruses of both
\textit{Varidnaviria} and \textit{Duplodnaviria} often encode a
(nearly) complete DNA replisome \citep{178}, a variety of proteins
involved in transcription and translation \citep{219,220,221}, and scores
of structural proteins and virion assembly factors \citep{222}. 
\looseness=-1

Nearly two decades ago, Dennis Bamford proposed the concept of ``viral
self'', that is, an evolutionary stable core of vertically inherited
genes that define viral lineages \citep{127,223}. In the context of
the assemblage of viruses now known as \textit{Varidnaviria}, the self
was defined as the DJR-MCP and the packaging ATPase. The concept has
stood the test of time: today, we can indeed trace vertical evolution
of viral realms through the phylogenies of VHGs that comprise the viral
self. Notably, however, the types of VHGs comprising the self differ
across the realms. In viruses of the realms \textit{Riboviria}, with
their small genomes, the self consists of the VHGs involved in
replication, namely, RdRP and RT. By contrast, in the realms of viruses
with DNA genomes, it is the morphogenesis module that qualifies as the
self. There is a simple functional and evolutionary logic to this
distinction. Viruses with RNA genomes cannot fully rely on the host
replication machinery because RNA replication is not a normal cellular
process. Therefore, these viruses must employ dedicated replication
enzymes that cannot be replaced by cellular counterparts and thus are
consistently inherited throughout the evolution of the viral realm (as
always in biology, there are exceptions: ribozyvirians and possibly
other viroid-like viruses do not encode any replication proteins,
instead hijacking the host transcription apparatus; these unusual
viruses, however, encompass ribozymes required for replication).
Conversely, RNA viruses often exchange and occasionally reinvent genes
for structural proteins (e.g., capsid protein of togavirids or
nucleocapsid protein of ortervirals \citep{52}) so that these genes do
not qualify as the viral self. In principle, the same logic could be
applied to ssDNA viruses because cellular DNA replication mechanisms do
not produce ssDNA intermediates and thus the virus must provide its own
solution. However, unlike in the case of RNA and RT viruses, there are
multiple unrelated enzymes that can yield ssDNA intermediates during
replication, including several families of HUH endonucleases,
Rep\_trans \mbox{endonucleases}, \mbox{protein-primed} family B DNA
polymerases and a few others, and all of these are interchangeably used
by ssDNA viruses from different realms (Figure~\ref{fig3}B), as
discussed above. Furthermore, in viruses of the realm
\textit{Pleomoviria}, there is certain flexibility with regard to which
replicative intermediate, ssDNA or dsDNA, is packaged into the virion
\citep{125}, which licenses exploration of a broader range of
replicative strategies. Of the four realms formerly included in the
\textit{Monodnaviria}, only in \textit{Floreoviria}, the replication
module comprises the self and is used for tracing the evolutionary
relationships between the member viruses although some of these viruses
(e.g., annelovirids and bidnavirids) lack the signature two-domain HUH
endonuclease  \citep{109,224}. By contrast, viruses with dsDNA genomes
can use the cellular replication machinery, and accordingly, these
viruses widely differ in their repertoires of replication genes, from
retaining the entire replication machinery to shedding it all.
Furthermore, replication genes of these viruses, for example, DNA
polymerases, are occasionally exchanged with cellular or
non-orthologous or even non-homologous viral counterparts \citep{225}.
Therefore, the self of the respective realms can be only identified
with genes encoding structural and morphogenetic proteins. A
transitional state between viruses that require a specialized
replication machinery and those that can exploit the host replication
apparatus is represented by polyomavirids and papillomavirids. After
switching from ssDNA to dsDNA genomes, these viruses lost the ancestral
capacity to replicate via RCR due to mutational inactivation of the HUH
endonuclease and switched to the theta replication mechanism typical of
host DNA (Figure~\ref{fig3}B), while retaining the inactivated
endonuclease as a DNA-binding \mbox{domain} involved in the replisome function
\citep{226,227,228}.

\subsection{Complexification and functional{\hfill\break} convergence:
major trends in viral evolution}\label{ssec52}

Reconstruction of virus genome evolution by mapping gene repertoires on
phylogenetic trees of VHGs revealed the predominant trend of
complexification by gene accretion although reductive evolution also
appears to have occurred in some lineages \citep{22,51,151}.
Remarkably, this evolutionary trend is typical of both viruses with
small genomes, such as ribovirians, and those with the largest known
genomes, such as members of phylum \textit{Nucleocytoviricota} in realm
\textit{Varidnaviria}, as well as caudoviricetes. Many virus lineages,
however, appear to maintain relative stasis with respect to genome size
and complexity across long evolutionary spans, exchanging and replacing
many genes while maintaining nearly constant genome size. This
evolutionary regime is clearly observed, for example, in phylum
\textit{Preplasmiviricota} within \textit{Varidnaviria}, and could be
linked to the physical constraints, namely, internal volume and packing
capacity of the virions. 

The great majority of the genes acquired by viruses come from the
hosts, although notable cases of viral gene duplication followed by
diversification are known as well \citep{229}, whereas some viral genes
might have evolved {de novo} from non-coding sequences
\citep{230,231}. The genes acquired by viruses encode four broad
functional categories of proteins: (i) components of replication and
expression systems enabling the relative autonomy of viruses from the
respective cellular processes, (ii) structural proteins that initially
become minor components of the virions but eventually can replace
hallmark MCPs, for example, in pandoraviruses \citep{232}, (iii)
metabolic enzymes that supplement and/or modulate the respective
metabolic pathways of the host cells, (iv)~proteins involved in various
types of virus--host interactions, in particular, inhibitors of host
immune systems. Because the benefits conferred by these functions are
common to many if not all viruses, there is extensive convergence among
the suites of genes captured by viruses of distant lineages, including
different realms, especially in the first of the above functional
categories. A notable case in point is the independent acquisition of
helicases of three different superfamilies by orthornaviraens of
different phyla \citep{51}. Apparently, helicases enable replication of
(relatively) large RNA genomes, as indicated by the fact that viruses
with positive-sense RNA genomes larger than 6~kb encode a helicase of
one or another superfamily \citep{233}, and thus open up the path of
evolution toward genome complexification (again, there is a notable
exception, a flavi-like ribovirus with a nearly 12~kb genome lacking
any helicase \citep{234}). A further increase of the RNA
genome size, beyond the ${\sim}$25~kb threshold, apparently required the
acquisition of an enzymatic complex involved in RNA proofreading, which
occurred in coronavirids and other nidovirals, the giants of the RNA
domain of the virosphere \citep{76,77,235}. Similarly, many lineages of
orthornaviraens independently acquired proteases of different families
involved in virus polyprotein processing, yielding mature viral
proteins which is one of the main strategies for expression in viruses
of eukaryotes where initiation of translation typically occurs at the
5$^{\prime}$-end of an mRNA \citep{52,236}. 

A case that has been widely discussed in scientific and even popular
literature, giving rise to provocative ideas, is the presence of genes
encoding multiple components of the translation system, both proteins
and tRNAs, in some viruses of phylum \textit{Nucleocytoviricota}, in
particular, mimivirids \citep{221}. One of the mimivirids, tupanvirus,
encodes the entire suite of translation system components, except for
those of the ribosome \citep{237, 238}. Given that translation is often
considered a quintessential cellular functionality of which viruses are
incapable, these findings triggered ideas on the evolution of the
``giant'' viruses from cells, possibly, even from a hypothetical fourth
domain of life \citep{21, 239,240,241,242,243,244} (see also discussion
below). Detailed phylogenetic analyses, however, unequivocally indicate
that translation system components have been independently acquired by
nucleocytoviricetes from eukaryotic hosts at different stages of
evolution \citep{245,246,247,248}. Indeed, some of these genes encoding
orthologous proteins with the same function apparently have been
captured independently by distantly related viruses \citep{247}.
Scattered genes for translation system components are present also in
some bacterial and archaeal caudoviricetes, including multiple
\mbox{tRNAs} (up to 62), tRNA synthetases and even \mbox{ribosomal}
\mbox{proteins}, clearly, convergent acquisitions \citep{171, 174,
249,250,251}. The roles of virus-encoded translation system components
in virus replication are not well understood, but it appears likely
that viruses reprogram the host translation system to different degrees
depending on the virus--host combinations, including the localization
of the translation of viral mRNAs to a distinct compartment within the
infected cell \citep{252}. In addition, the role of certain viral tRNAs
in countering diverse tRNA-targeting defense systems has also been
demonstrated \citep{253}.

In the second functional category, metabolic enzymes acquired by
viruses, a notable case of convergence is the independent capture by
viruses with large dsDNA genomes within both \textit{Varidnaviria} and
\textit{Duplodnaviria} of enzymes and entire metabolic pathways for the
biosynthesis of nucleotides, for example, thymidine kinase, thymidylate
synthases of two unrelated families, and ribonucleotide reductases
\citep{52}. Other metabolic pathways are limited to specific viral
lineages, for example, the glycosphingolipid biosynthesis pathway in
\textit{Coccolithoviridae}, a family within \textit{Nucleocytoviricota}
\citep{254, 255}. 

Which genes are retained by viruses upon accidental acquisition,
presumably, depends on the host biology and response to virus
infection, particularly in the case of metabolic enzymes. In some
well-studied cases, the selection factors driving the fixation of the
acquired genes in viruses are clear as, for example, for phages
infecting cyanobacteria \citep{256}. Many of these cyanophages encode
the entire photosystem I or II, and some even both photosystems
\citep{257, 258}. Upon infection, the viral photosystems supplement the
host ones, augmenting their activity so that the host cells remain
energy-rich and conducive to massive replication of the cyanophages. In
most cases, however, the connection with the host biology remains
elusive, remaining to be elucidated through direct experiments with the
respective virus--host systems, which are typically technically
challenging if feasible at all.

The viral genes directly involved in virus--host interactions are even
more specifically linked to the host biology. A striking example is the
movement proteins (MP) encoded by a broad variety of plant viruses
including diverse orthornaviraens, pararnaviraens and cressdnaviricots
(realm \textit{Floreoviria}), and enabling intercellular movement of
viruses in plants through the plasmodesmata \citep{259}. The MP
originally evolved via a duplication of the SJR-MCP in an
orthornaviraen, and then, spread extremely widely among plant viruses
via HGT \citep{260}. Analogously, in animals, one of the principal
mechanisms of virus entry into the host cells is membrane fusion, and
accordingly, fusion proteins have spread across a broad variety of
animal RNA and DNA viruses \citep{261, 262}. 

The great majority (most likely, all) viruses with large genomes, that
is, those of realms \textit{Varidnaviria}, \textit{Duplodnaviria}, and
\textit{Adnaviria}, and even some viruses with small genomes in
\textit{Riboviria} and some groups of ssDNA viruses encode proteins and
often entire systems involved in the inhibition of host immunity
\citep{263, 264}. In large viruses, these genes can comprise a major
fraction of the gene repertoire, and this part of the genome is highly
variable, rarely being conserved beyond a viral family. The
counter-defense genes can be conceptually divided into two large
classes: (i)~repurposed components of immune systems that function as
dominant-negative inhibitors according to the ``guns for hire
principle'' \citep{265}, that is, the use of homologous components for
both defense and counter-defense, and (ii)~dedicated inhibitors, often
without detectable homologs, some of which also employ the
dominant-negative mechanism. The typical case in the first category are
poxviruses, large animal viruses within \textit{Nucleocytoviricota}
which encode a broad variety of homologs of host proteins involved in
immunity and related signal transduction pathways. Examples include
homologs of tumor necrosis factor, interleukin~18, MHC Class I and
apoptotic factor Bcl2, all of these represented by families of paralogs
in poxvirus genomes \citep{229}. Although not all the mechanisms have
been studied in detail, these proteins appear to act as dominant
negative inhibitors of the cognate host immune pathways. In addition,
poxviruses encode a family of proteins of the second class of
counter-defense factors containing a chemokine-binding \mbox{domain} that
appears to have no cellular counterpart and thus seems to have evolved
within poxviruses themselves, possibly, via a radical rearrangement of
a pre-existing ancestral domain \citep{266}. In other viruses of
eukaryotes, both nucleocytoviricots, and mirusviricots and
herpesvirals, the counter-defense genes are not well characterized, in
particular, because few if any homologs of host immune proteins that
could act as decoys have been identified. However, numerous genes of
these viruses (with the partial exception of othoherpesvirids) remain
orphans without detectable homologs, and many of these are likely to be
involved in counter-defense. The difficulty with the identification of
those is, in large part, that the hosts of many of these viruses remain
unknown, and even for known unicellular eukaryotic hosts, the immune
systems remain largely uncharacterized. 

The counter-defense landscape of caudoviricetes as well as adnavirians
is better known. It appears that these viruses employ primarily
distinct counter-defense proteins, most without detectable homologs
outside narrow groups of viruses, rather than mimics of host immune
proteins. Most of these counter-defense proteins, in particular,
anti-CRISPR proteins that have been investigated in detail in the last
few years as part of the CRISPR boom \citep{267, 268} are quite small
(about or less than 100 amino acids) and often are not predicted to
adopt a globular conformation or predicted to adopt a unique fold
\citep{269}, suggesting the possibility of their emergence {de
novo} from non-coding sequences. Most of these small counter-defense
proteins are highly specific towards particular bacterial or archaeal
defense systems, binding to unique sites of immune proteins, such as,
for example, CRISPR effector nucleases, including Cas9 and others. A
notable exception to this high specificity are DNA-mimicking
counter-defense proteins, small, negatively charged proteins that can
inhibit host immune systems promiscuously by blocking their DNA-binding
domains, that is, employing a distinct variant of the decoy strategy
\citep{270, 271}. Although unique counter-defense proteins prevail in
large viruses of prokaryotes, the guns for hire principle is employed
as well. Thus, some phages, in particular jumbo phages with their large
genomes, have coopted complete, functional CRISPR systems
\mbox{targeting} host \mbox{defense} systems or other MGE \citep{272,
273}. Even more common in the genomes of phages and archaeal viruses
are CRISPR microarrays exploiting the host Cas proteins primarily to
target other viruses \citep{274, 275} and small anti-CRISPR RNAs that
are copies of single CRISPR repeat and form non-productive complexes
with Cas proteins, acting therefore as CRISPR-RNA decoys \citep{274,
276, 277}.

Among viruses with small genomes, only some, apparently, a minority
encode dedicated counter-defense proteins, again in connection with the
host biology. Thus, diverse plant viruses as well as certain animal
viruses, some with very small genomes, encode silencing suppressors,
small proteins, possibly, emerging {de novo}, that inhibit the
powerful RNA interference systems of the hosts \citep{278, 279}.

To summarize this discussion of the general trends in the evolution of
the virosphere, all viruses have a small core (self) that consists of a
few, and in many cases, only one VHG(s) essential for viral replication
and/or morphogenesis. These viral selves evolve vertically through
eons, for millions or even billions of years, providing for the
identification of viral realms, kingdoms and phyla. The rest of the
viral genomes is highly dynamic and malleable, evolving through
numerous gene gains, losses and exchanges, the exact history of which
is often hard to reconstruct. There is, however, a lot of convergence
and parallelism in the evolution of genes contributing to viral genome
replication and expression due to shared requirements of diverse
viruses. There is also some but lesser convergence among viral
metabolic genes, whereas genes encoding virion components as well as
those involved in counter-defense are an epitome of diversity and
dynamism. 

A consequence of the convergence in the acquisition of functionally
analogous and often homologous host genes by diverse viruses is that,
although viral realms, by definition, represent assemblages of viruses
of independent origins, different realms are loosely connected by
patchy sharing of homologous proteins. Prominent examples include the
aforementioned RNA and DNA helicases (in particular, superfamily~3
helicases), DNA polymerases and DNA-dependent RNA polymerases,
chymotrypsin-like and papain-like proteases, movement proteins of plant
viruses, and more \citep{15}. 

\section{Viruses as symbionts of cellular life forms}\label{sec6}

The pre-eminent immunologist Sir Peter Medawar famously said in his
1960 Nobel lecture that ``No virus is known to do good. It has been
well said that a virus is a piece of bad news wrapped up in protein''
\citep{280}. However, the more deeply is the virosphere explored, the
clearer it becomes that the news conveyed by viral genomes is not
necessarily bad, but perhaps, more often, is neutral or even good. Put
another way, viruses are ubiquitous symbionts of cellular life forms,
and their relationships with the hosts span the entire symbiotic
spectrum, from parasitism to commensalism to mutualism. For obvious
reasons, parasitic viruses that lyse host cells (thus, by definition,
killing a unicellular host, and often, a multicellular one as well)
have been discovered and studied first. However, already the early
discovery of temperate (lysogenic) phages that can be vertically
transmitted across many generations of bacteria as prophages showed
that viruses are not invariably deleterious to the host \citep{281}.
Indeed, prophages often encode defense systems and can have beneficial
effects on the host fitness by protecting bacteria from infection by
other, virulent phages (superinfection exclusion) \citep{282,283,284}.
There can be other benefits as well, notably transduction, that is, HGT
for which phages serve as vehicles \citep{285}. The genes transferred
by phages include defense systems, as per the ``guns for hire''
concept, antibiotic resistance genes as well as genes encoding enzymes
of various metabolic pathways \citep{286,287,288,289}. Moreover, some
bacteria and archaea have domesticated phages, turning them into Gene
Transfer Agents (GTA), which are defective phages that incorporate into
virions random host genes rather than the phage genome and function as
dedicated, stress-induced HGT vehicles \citep{290, 291}. Notably, GTAs
can rescue recipient cells from DNA damage by providing templates for
homologous \mbox{recombination} {\citep{292}.} Conceptually
similar exaptation of large DNA viruses for the transfer of
immunosupression genes took place in parasitoid wasps, apparently on
several independent occasions, leading to the emergence of viriforms,
virus derivatives carrying host DNA \citep{293,294,295}. Recently, it
has been shown that a variety of phages considered virulent can
accumulate in bacterial cells without either lysing them or integrating
\citep{296, 297}, suggesting that viral commensalism is far more
widespread than previously\break suspected.

A notable case of different types of symbiosis is presented by
virophages, the viral symbionts of nucleocytoviricots \citep{136}. Some
of the virophages efficiently inhibit the propagation of the cognate
large viruses, in effect functioning as adaptive immunity systems---and
hence mutualists---for the unicellular eukaryotic host \citep{298,
299}. Other virophages have little if any effect on the supporting
large virus and appear to be commensals. More generally, intervirus
symbiosis and coevolution in complex virus--host systems are an
important, currently poorly understood aspect of the evolution of the
virosphere \citep{300}. 

Every organism apparently supports a ``healthy'' virome, that is,
commensal and mutualist viruses \citep{301}. For example, exploration
of plant and fungal transcriptomes reveals an increasing number of
cryptic viruses that cause no symptoms and have been largely missed by
traditional virus identification approaches \citep{302, 303}.
Similarly, animals, including humans, harbor a variety of viral
commensals, for example, anellovirids with tiny ssDNA genomes that are
ubiquitous and abundant in humans but have not been associated with any
pathology \citep{112, 304, 305}. Both mammals and insects also host
true viral mutualists, such as endogenous retroviruses (often denoted
LTR retrotransposons) integrated in the genome, some of which form
particles transferring RNA between host cells and apparently
contributing to human physiology, in particular, that of the nervous
system \citep{306,307,308,309}.

In general, the scale of viral commensalism and mutualism in the
biosphere remains to be elucidated. Infected cell lysis and host
killing are not per se targets of selection in the evolution of 
viruses---rather, the maximum level of virus propagation is. Some viruses
achieve this by killing the host, but for many, perhaps the majority of
viruses, this is not{\break} the case.

\section{The origins of viruses: genetic parasites as an intrinsic
feature of life, the primordial replicator pool and exaptation of
cellular genes}\label{sec7}

As emphasized above, viruses are literally ubiquitous in the biosphere.
Apparently, all organisms, with the possible exception of some
intracellular symbiotic bacteria with highly reduced genomes, are
infected by multiple viruses, often highly diverse ones, representing
different realms. Humans, for example, are hosts to numerous viruses of
both kingdoms of \textit{Riboviria}, \textit{Floreoviria} (parvovirids,
anellovirids, papillomavirids and others), \textit{Varidnaviria}
(poxvirids) and \textit{Duplodnaviria} (orthoherpesvirids). The
empirical data on the ubiquity of viruses are complemented by a
theoretical argument on the inevitability of the emergence of genetic
parasites in any replicator system \citep{7}. This argument has been
developed in thermodynamic terms, based on the entropy increase
associated with the emergence of genetic parasites. In qualitative
terms coming from game theory, the inevitability of the emergence of
genetic parasites can be explained quite simply: as soon as, in a
replicator system, there is an alienable resource that can be
appropriated, such as a replicase, cheaters will emerge that will use
that resource without making it. These theoretical considerations are,
in turn, compatible with the reconstruction of the virome of the LUCA
that is estimated to have existed about 4~billion years ago \citep{310}
and is the earliest point in the history of life for which such a
reconstruction is attainable. This reconstruction indicates that the
virome of the LUCA was already highly complex, comparable with the
extant bacterial viromes, that is, probably included representatives of
all the major viral realms \citep{88}. 

The LUCA was certainly not the first life form to appear on earth but
rather a product of extensive evolution from primordial protocells that
likely existed within the framework of the RNA world, where RNA
molecules performed both template and catalytic functions \citep{311,
312}. The inevitability of genetic parasites implies that they emerged
concomitantly with the very first replicators. Furthermore, whereas the
primordial replicator pool is thought to initially have consisted of
RNA molecules only, small and then larger DNA replicators must have
evolved at subsequent stages of evolution, long before the time of the
LUCA. Thus, replicators corresponding to all Baltimore classes of
extant viruses were most likely already parts of the primordial pool
\citep{8}. Comparative analysis of the essential proteins involved in
viral genome replication is compatible with the very early origin of
the viral replication machineries \citep{313}. Indeed, behind the
diversity of viral replication enzymes, there is remarkable, deep
unity. The replication enzymes of most viruses from different realms,
including RdRP, RT, HUH endonuclease, archaeo-eukaryotic primases and
family B and A DNAPs, all contain homologous core RNA Recognition Motif
(RRM) fold domains. The RRM is one of the most common nucleic acid
binding domains in nature, found in an enormous variety of proteins,
most of them devoid of enzymatic activity \citep{314}. The parsimonious
explanation of the presence of catalytically active RRMs in viral
replication enzymes is that they all radiated from a non-enzymatic
ancestral RRM that was likely one of the earliest protein domains to
evolve and could have served as a cofactor to ribozyme polymerases at
the exit from the RNA world stage \citep{313} (Figure~\ref{fig8}). The
second widespread structural scaffold found in diverse RNA and DNA
polymerases is the double psi-beta barrel domain that has been
suggested to represent the ancestral replicase of cellular organisms
\citep{315,Kooninetal2020}. 

\begin{figure*}
\includegraphics{fig08}
\caption{\label{fig8}Origin of viruses. The depicted scenario combines
origin of viral genomes and replication machinery from the primordial
(pre-cellular) pool of replicators with subsequent exaptation of
cellular proteins, such as carbohydrate-binding jelly-roll proteins, as
capsid proteins, resulting in the emergence of {bona fide}
viruses. Replicative genes are rendered in green, and structural genes,
in red and blue. The two major types of primordial replicases based on the RNA
recognition motif (RRM) and double psi-beta barrel (DPBB) domain are
hypothesized to have evolved in pre-viral and pre-cellular replicators,
respectively.} 
{\vspace*{-.05pc}}
\end{figure*}

Thus, all types of replicators, including parasitic ones, in all
likelihood, evolved very early in the evolution of life, prior to the
advent of modern-type cells (Figure~\ref{fig8}). However, parasitic
replicators are not (yet) viruses, and bona fide viruses hardly could
have existed before full-fledged cells emerged. Indeed, for major
virion proteins, cellular ancestors have been identified, the most
prominent cases in point being the origin of SJR-MCP and DJR-MCP from
\mbox{distinct} families of sugar-binding proteins and glycoside
hydrolases, respectively \citep{26, 158}, suggesting viruses appeared
on the scene at a relatively advanced stage of evolution, after
multiple protein families have already diversified, even if long before
the LUCA (Figure~\ref{fig8}). 

For decades, the origin of viruses has been discussed in terms of three
distinct scenarios: (i) viruses early, preceding the origin of cells,
(ii) viruses late, via reductive evolution of cellular parasites, (iii)
escaped genes, that is, another version of viruses late, deriving
viruses from ``regular'' cellular genes attaining replicative autonomy
\citep{8, 10, 316, 317}. The evolutionary scenario outlined above is a
hybrid of ``viruses early'' and ``escaped genes'': here, the viral
replication machinery derives from the primordial pool of replicators
(viruses early) whereas the virions evolve later from cellular proteins
(escaped genes) (Figure~\ref{fig8}). Importantly, the origin of viruses
from non-viral symbiotic replicators, in particular, plasmids, is not
limited to the hypothetical scenario of viral origin. Rather, this path
of evolution was recapitulated on multiple occasions, in particular, at
the origins of several groups of viruses with ssDNA genomes \citep{109}
as well as the origin of the expansive orthornaviraen family
\textit{Botourmiaviridae} from capsid-less narnavirids \citep{51}. 

\begin{figure*}
\includegraphics{fig09}
{\vspace*{-.3pc}}
\caption{\label{fig9}Eukaryogenesis and the origin of the eukaryotic
virome. Two alternative scenarios of eukaryogenesis are shown. 
(A)~Symbiogenetic scenario whereby an Asgard archaeon with a complex
cellular organization engulfed an alphaproteobacterium, the
proto-mitochondrial endosymbiont, with subsequent replacement of the
archaeal membrane (red outline) by the bacterial one (black outline).
(B) Syntrophy scenario including two endosymbioses whereby a bacterium
(possibly, a deltaproteobacterium) engulfed an Asgard archaeon,
resulting in the dissolution of the archaeal membrane and followed by a
secondary engulfment of an alphaproteobacterium. In this scenario, the
continuity of the bacterial membrane is preserved throughout.  Under
both scenarios, archaeal viruses are excluded from the emerging
eukaryote, and as a result, the eukaryotic virome is seeded by
bacterial viruses. FECA, SECA and LECA: first, second and last
eukaryotic common ancestor, respectively.}
{\vspace*{-.35pc}}
\end{figure*}

\section{Origins of the eukaryotic virome}\label{sec8}

The origin of eukaryotes (eukaryogenesis) is the second pivotal major
transition in the evolution of life, after the origin of cells
\citep{318, 319}. According to the latest, best supported scenario, the
eukaryotic cell evolved as a result of engulfment by an Asgard archaeon
related to the order Hodarchaeales of an alphaproteobacterium that
became the protomitochondrial endosymbiont \citep{320, 321}
(Figure~\ref{fig9}A). The reconstruction of the virome of the Last
Eukaryotic Common Ancestor (LECA) showed, not unexpectedly, that this
ancestral virome was highly complex, including representatives of all
the larger realms, and likely, most of the phyla of the known viruses
of eukaryotes \citep{322}. The unexpected outcome of this
reconstruction, however, was the apparent origin of all major groups of
viruses of eukaryotes from bacterial rather than archaeal ancestors
(Figure~\ref{fig9}A,B). In particular, all viruses of Asgard archaea
identified to date are typical archaeal viruses without traceable links
to viruses of eukaryotes \citep{190, 215, 323}. Given that viruses are
genetic parasites and the eukaryotic systems for information storage
and transmission are of Asgard archaeal descent almost in their
entirety \citep{321}, this finding appeared puzzling and even
paradoxical.\ The best possible explanation seems to be the major
difference between the structures of archaeal and bacterial membranes
\citep{324}, and likely membrane-embedded viral receptors that could
have led to the exclusion of archaeal viruses from the evolving
eukaryotic cells upon the replacement of the archaeal membranes (and
cell walls) with bacterial ones \citep{322}. Under more complex
scenarios of eukaryogenesis---such as the syntrophy model, which
postulates the initial engulfment of an archaeon by a bacterium,
preserving the continuity of bacterial membranes throughout
eukaryogenesis \citep{325, 326}---exclusion of archaeal viruses could
have occurred at the initial symbiosis stage
(\mbox{Figure}~\ref{fig9}B).\looseness=-1

\section{Reproducers and replicators, fundamental{\hfill\break}
virus-cell divide and the place of the{\hfill\break} virosphere within
the replicator space}\label{sec9}

Evolving biological entities belong to two fundamentally distinct
types: reproducers and replicators \citep{327}. Reproducers are,
essentially, analog devices that retain physical continuity throughout
the course of evolution. All cellular life forms possess the properties
of reproducers as reflected in the famous formula of Rudolf Virchow:
\textit{Omnia cellula e cellula}. Replicators, in contrast, are digital
devices, so that physical continuity is not necessary for their
propagation and evolution; information contained in the nucleotide
sequence of the genome is sufficient. Clearly, all MGEs, viruses in
particular, are replicators (the propagation of many viruses requires
not only the genomic nucleic acid but also some virion protein(s) or
even macromolecular structures to enter the cell; however, such
structures are not transmitted to the next generation). Genomes of
cellular life forms in themselves are replicators as well, so that
organisms actually represent a union of reproducers and replicators.
The establishment of mutualistic symbiosis between primordial
reproducers and replicators might have been a pivotal point in the
origin of life \citep{328}. 

The split between reproducers and replicators appears to be the most
fundamental divide in biology. There is no evidence that the barrier
between these two types of propagating, evolving entities has ever been
crossed in the more than 4~billion years of the evolution of life, and
given this perpetual but not blending symbiosis, this barrier appears
to be fundamentally impenetrable. Attempts to define viruses as
biological entities distinct from cells on the basis of simple criteria
such as size have been hopelessly compromised by the latest findings,
in particular the discovery of giant viruses \citep{5}. The definition
of viruses as capsid-encoding organisms, in contrast to cellular life
forms, which are ribosome-encoding organisms \citep{329}, fares better
(regardless of whether or not it is appropriate to call viruses
organisms). However, the discovery of viruses that encode almost all
components of the translation system apart from the ribosome or
multiple ribosomal proteins (see above) puts even this definition into
doubt. What if a virus is discovered that actually encodes its own
ribosome? Will this eliminate or blur the distinction between viruses
and cells, as it has been repeatedly suggested giant viruses do? In our
strong opinion, no such blurring will occur because even a
ribosome-encoding virus will remain a replicator, not a reproducer.
Thus, the best definition of a virus may be ``a replicator encoding at
least one protein encasing the viral genome or an MGE demonstrably
derived from such a replicator'' \citep{5}. 

The definition of viruses as a distinct type of replicators prompts the
question on the position of the virosphere within the space of
replicators \citepalias{5}. Apart from typical viruses that meet the above
definition and comprise the ``orthovirosphere'', a broad variety of
virus-like MGE that can be derived from viruses or ancestral to viruses
inhabit the ``perivirosphere'' (Figure~\ref{fig10}). The denizens of the
perivirosphere include capsid-less derivatives of regular viruses (for
example, narnaviruses, mitoviruses, umbraviruses, endornaviruses and
others within \textit{Riboviria}), various satellite nucleic acids,
viroids and viroid-like cccRNA replicators, and other MGE, such as
viriforms. The orthovirosphere and perivirosphere are embedded in the
greater replicator space that encompasses ``non-virus-like''
replicators such as typical transposons, integrating conjugative
elements, plasmids as well as genomes of cellular life forms
(Figure~\ref{fig10}). Crucially, the boundaries between the different
domains of the replicator space are porous, and evolutionary
interconversions between different types of elements abound. In sharp
contrast, as emphasized above, the wall separating the replicator space
from reproducers appears to be impenetrable. 

\begin{figure*}
\includegraphics{fig10}
\caption{\label{fig10}The replicators space and its three domains. The
replicator space is represented as consisting of three domains:
orthovirosphere including the 11 recognized realms of viruses;
perivirosphere including virus-like replicators; and the outer
replicator space including all other replicators, in particular, the
genomes of cellular life forms. The boundaries between the domains are
shown by dashed lines to emphasize multiple evolutionary
interconversions between different types of replicators. Obelisks are
cccRNA replicators encoding an uncharacterized protein and replicating
in bacterial cells \citep{333, 334, Urayamaetal2026}.  Transpovirons
are small dsDNA plasmids that are commensals of viruses in the phylum
\textit{Nucleocytoviricota} \citep{232,335}.} 
\end{figure*}

\section{Concluding remarks}\label{sec10}

Obviously, in this rather long review article, we could only
superficially cover most aspects of the organization and evolution of
the vast world of viruses. The diversity of known viruses is now
growing more rapidly than ever, primarily, through metagenome and
metatranscriptome mining, often revealing previously unsuspected
features, for example, viruses with genomes in excess of 4~mb or
diverse (putative) viruses with cccRNA genomes. Nevertheless, however
astonishing this might be, it appears that we already know the global
organization of the virosphere and the main structure of the
megataxonomy of viruses. At the same time, it should be emphasized that
this structure is an inherently moving target, as demonstrated, in
particular, by the ongoing splitting of some of the viral realms. From
the structuring of the virosphere and extensive phylogenomic
\mbox{analyses}, general principles of virus evolution are emerging and
new research frontiers are opening up. Some of the important directions
of future research include the study of different facets of virus--host
symbioses as well as coevolution of viruses in complex systems such as
those including satellite viruses. There is no doubt that new,
fascinating features of the virosphere and its evolution will be
uncovered.

%\section*{CRediT authorship contribution statement}
%\textbf{Mart Krupovic} and \textbf{Eugene V. Koonin} contributed
%equally to all aspects of the article.

\section*{Acknowledgements} 

EVK is supported by the Intramural Research Program of the National
Institutes of Health of the USA.

%\section*{Declaration of interests}
%The authors do not work for, advise, own shares in, or receive funds
%from any organization that could benefit from this article, and have
%declared no affiliations other than their research organizations.
\printCOI

\back{}

\printbibliography
\refinput{crbiol20260085-reference.tex}

\end{document}
