1 Introduction
The field of evolutionary biology benefits from the contributions of various areas of life sciences as well as from physics and mathematics, history and philosophy. Fed by an ever growing body of biological data, this is a complex and fast-moving field that forms an intellectual melting pot in which theoretical views are enounced, analyzed and confronted [1–4]. New biological knowledge constantly reveals fresh evolutionary questions related to evolvability, phenotypic plasticity, epigenetic regulation and systems biology. The vast majority of these evolutionary concepts reference Darwinian Theory and many have the potential to add to the Darwinian frame, or to distort it to some extent, as does the consideration of epigenetic phenomena for example [5]. The issue of complex adaptation receives more and more attention [6], as does the question of non-adaptive processes in human evolution [7,8]. As a result of our increasing knowledge, there is now support for a movement towards an Extended Synthesis for Evolutionary Biology [9], or at least for an extended Darwinian theory which would incorporate new features such as inclusive inheritance [3] or complex system dynamics [10].
Here, I rely on recent advances in our understanding of cellular networks to show that in complex multi-cellular organisms such as man, there may be internal selective processes that operate on germ line cells during their development, up until the point of fertilization. This developmental period, which is particularly extended in the male, provides a window for multiple mutations, a number of which must inevitably have functional consequences that are subjected to quality control mechanisms acting within cellular networks. Such networks can be termed “selfish” in the sense that every individual cell at any level of gamete development (as well as in the soma) is endowed with a robust functional network that is capable of sensing both the internal and external environment. The robustness of individual cellular networks can buffer the phenotypic effect of mutations, but it is possible that at certain check points, key indicators such as energy sensing may trigger apoptosis of malfunctioning cells. Here, I will argue that this does not only provide grounds for strong purifying selection, but opens a field for evolvability, perhaps allowing the emergence of novel features, especially those related to essential housekeeping cellular functions. Once pre-filtered at the parents’ germ line level, new traits would then be subjected to classical natural selection involving the descendant organisms and the environment. I will then finally discuss the extent to which such a mechanism could apply to organisms less complex than man, and explore several possible consequences of this hypothesis.
2 Mutations in human germ line cells
The number of human germ-cell undergoes 24 divisions before the female oocyte is formed, all of which are completed before birth. In contrast, spermatogenesis continues during the entire male reproductive life, such that the number of cellular divisions separating the primordial male germ cell from spermatozoa is estimated to be 216 at the age of 20 and in the range of 600 by the time a man reaches 40 [11]. This difference likely explains the well-documented observation that many new mutations causing serious disease are of paternal origin [12,13] – a suggestion made by J.B.S. Haldane as early as 1947 [14]. However, this gender bias is limited to single base substitution mutations, rather than deletions and duplications, and is complicated by other factors including age-dependent maternal effects that can give rise to conditions such as Down's Syndrome. In addition, all somatic cells are subject to time-dependent mutations that are not linked to the replication process, for example at CpG sites, which can contribute to cancer and aging [13]. The importance of oxidative DNA damage has also been underlined [15]. The extent to which these non-replicative mutations occur in female and male germ line cells and gametes is not known. Further complicating the issue, cells also harbor hundreds of mitochondria, each with multiple copies of mitochondrial (mt)DNA which replicate independently of nuclear DNA and mutate at a high rate. This has functional consequences as mitochondrial defects play a role in both male and female infertility [16]. Interestingly, in both mice and humans, a strong self-purifying selection process has been shown to operate in pro-oocytes [17–19]. This is important as it is thought that the accumulation of deleterious mtDNA mutations might otherwise drive the species to extinction [20]. The following discussion will focus on male germ line cells because they divide many times and offer the greatest scope to accumulate mutations [21].
2.1 Rates of de novo mutations
Considerable effort has been devoted to estimating the rates of new mutations per individual and per generation which go on to cause human diseases. The rate of emergence of such deleterious mutations is in the order of two to three per zygote, per generation [11,22], of which 40 to 60% may be eliminated by natural selection. These estimates are high enough to feed pessimistic views about the future evolution of mankind. Thus, for Lynch [13], “it is difficult to escape the conclusion that the per-generation reduction in fitness due to recurrent mutation is at least 1% in humans and quite possibly as high as 5%”. While at least Crow [11] finds grounds for some optimism in the hope that “the brave new world of molecular genetics will provide ways of detecting and eliminating important mutant genes with little human or social cost”.
The quest for more accurate estimations of the number of de novo mutations in human gametes has been further advanced through whole genome sequencing, combined with careful elimination of sequence errors. From the analysis of a family quartet, Roach et al. [23] derived a human inter-generation mutation rate of 1.1 × 10−8 per position per haploid genome, or an average of 35 de novo mutations. Another study [24] identified 49 and 35 germ line de novo mutations in two trio offsprings (they observed considerable variation in the male/female distribution of mutations, but the parental ages at conception were not available in this study). These figures are lower than those previously derived from studies of human-chimpanzee sequence and time of divergence. In addition, as noted by Lynch [13], the rate of mutation per cell division in the male germ line falls to a remarkably low level because of the large number of cell divisions involved. This suggests that mutations are either avoided by particularly efficient genetic mechanisms, or eliminated by a selective process, or both. In the following discussion, I shall focus on the male germ line and retain the estimate of 35 de novo mutations (of all types) per gamete. It should be kept in mind that this figure is an average with a statistical spread from about 24 to 46 (95% confidence according a Poisson distribution) and that this may vary across a population as a function of the genetic background and/or the environment. There are also hints that the mutation rate is increased in certain disease conditions [25].
2.2 Deleterious versus weakly deleterious and non-deleterious mutations
How many of the average 35 de novo mutations per gamete are deleterious, weakly deleterious or non deleterious? The term “deleterious” is usually taken to mean that the mutation severely impairs the fitness of the organism, while single “weakly deleterious” mutations impair fitness somewhat less, but may combine to become more deleterious. It must be kept in mind that weakly deleterious mutations occur on a genetic background with a long mutational history. Thus, a weakly deleterious mutation may appear as such in one genetic context, but turn out to be fully deleterious in another.
The distribution of fitness effects of single mutations has been analyzed on a theoretical basis in the frame of Fisher's model of adaptation [26], while Eyre-Walker and Keightley [27] have reviewed the outcomes of experimental data. The distribution of mutation strengths in yeast has also been thoroughly examined [28], and in humans, much work has been carried out on the detail of mutations leading to amino acid substitutions in relation to disease [29–31]. At the population scale, comprehensive data from the 1000 Genome Project Consortium [32] paint an intriguing picture; on average, each person carries 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in genetic disorders. This figure must be compared with the observation that individuals generally differ from the human reference sequence at 10,000 to 11,000 non-synonymous sites, which represent many mutations with the potential to have a functional impact, albeit modest. European populations proportionally display more deleterious variations than African populations, a likely consequence of the genetic bottleneck that ancestral Europeans experienced when migrating out of Africa [33]. In general, however, it is now clear that coding sequences provide a significant reservoir of “weak”, non deleterious mutations.
For the purposes of this discussion, let us now assume that “weak” de novo mutations, which have even a marginal functional effect, occur at an arbitrary rate of 5 times more than deleterious ones. Since it is estimated that two to three severe mutations occur at every generation, we can form the hypothesis that 10 mild mutations take place on top of two deleterious changes – making a total of 12 “significant” mutations out of the 35 which any spermatozoa undergoes. But does the above assumption make sense considering the large amount of “junk” DNA in the human genome? The 20,000 genes embedded in the human genome occupy only about 1 to 2% of total genetic content, but the exact dimensions of their regulatory regions are unknown. The discovery of previously unknown, non-protein-coding RNAs (up to 3000) has led to re-evaluation of the importance of the role of so-called “junk” DNA [34], although it had been known for some time to play a part in the evolution of genes [35]. Indeed, man and his closest relative, the chimpanzee, differ little in their protein-coding genes, leading to the belief that many significant evolutionary changes have involved regulatory mutations, possibly in non-protein-coding RNAs. A further level of complication is added by the observation that the mutation rate also varies across the human genome [36], with the larger genes paying a higher than average mutational cost, especially in their splice regions [13].
On top of this, copy number variants may have been underestimated [37]. Non-mutagenic homologous recombination and gene conversion could also increase these numbers by shuffling genetic differences, especially in the frame of cis-trans epistasis [38]. Even in the absence of de novo mutations, the shuffling of pre-existing ones would create new combinations of mutations which may have functional significance. Arguably, the latter are functionally neutral, unless there are cis effects that are not complemented in trans. Indeed, when transmitted in haploid genomes, they may confer new characteristics on the progeny. They should probably be considered as de novo mutations in their own right, and added to the “classical” de novo mutations.
In a recent study [39], Eory et al. estimated that 5.4% of nucleotide sites in the genome are subject to effective negative selection and that there are three times as many constrained sites within non-coding sequences as within protein-coding sequences. This is however a conservative estimate; it is quite possible that “weak” mutations are not subject to effective negative selection. If this is the case, as discussed by Harris [7], the proportion of non-silent mutations with a functional impact must be higher. The above assumption of 30% of all mutations having some impact therefore seems acceptable. Note that even if the number of weak mutations was only four (instead of 10), with a total of 4 + 2 = 6 mutations affecting on average one sixth of the genome, the predictions presented below would still hold. Indeed, the roles and possible functions of non-coding DNA within genomes of various species are currently largely unknown, and are the subject of active research.
3 Individual cellular networks
3.1 What cellular networks are
Recent discoveries have profoundly modified our understanding of the way in which cells, particularly mammalian cells, work. The one gene–one protein principle remains largely correct for prokaryotic cells, but not eukaryotes. Through a series of mechanisms including alternative splicing, processing and post-translational modification, individual genes can give rise to a multiplicity of products. The 20,000 human genes may encode as many as 100,000 to 200,000 products. The notion of “promoters” has also become more complex, especially since the revelation that regulatory elements may be located at a distance from the transcription start site, outside the gene or within introns. The discovery of regulatory RNAs, such as small interfering RNAs (siRNA), micro-RNAs (miRNAs), and Piwi-interacting RNAs (piRNA), indicates previously unsuspected layers of regulation at the gene and messenger RNA stability and expression levels [40].
Protein interactions are also more complex than previously appreciated. Thus, many cellular functions involve high-order dynamic molecular aggregates where proteins assemble and disassemble through weak interactions with a plasticity that permits functional regulation. For example, RNA-polymerase II has a stable core decorated with a cloud of additional elements [41]. Topological rearrangements are essential for the functioning of the molecular synapses at work in the nervous and immune systems [42] or in the recently described “myddosome”, a common molecular platform for Toll-Like Receptors [43]. Cascades which transduce signals from the cell surface to the nucleus work through a series of protein-protein interactions often regulated by phosphorylation and de-phosphorylation by specific kinases and phosphotases. Interactions are weak and transient, but there are reasons to suspect that they best work within a certain topological frame, rather than operating freely in solution. Finally, the epigenetic (re-)modeling of chromatin and the intracellular formation and traffic of vesicles represent yet higher orders of organization.
Overall, cells harbor an extremely complex and dynamic network of interacting molecular elements that mediate cellular functions. Our appreciation of this complexity is increasing in line with the accumulation of data generated by ever more powerful technologies. Modeling individual cellular networks has become a major challenge for contemporary biology, but there remains no way to integrate all the relevant information in a single model. Much work is currently being dedicated to constructing accurate network models of modules and subsets likely to have relevant functional outcomes, such as networks of transcription factors, various regulatory RNAs and their targets, or the 500 kinases and their 200,000 phosphorylation sites. Attempts to understand the interactions between these various modules and sub-networks are also underway [44] and should prove highly illuminating.
Biological networks involve numerous dynamic interactions between a multiplicity of cellular components of variable abundance and structure. Systems biology [45] approaches to understanding biological networks therefore call upon and converge with the theories of complex systems which were initially conceived in connection with engineering and man-made artefacts [46]. Networks have nodes and edges (edges being an interaction between two nodes; nodes with many connections or edges are called “hubs”), and cellular networks are complex with, at the very least, several thousands nodes in a single yeast cell. However, regardless of the type of complex network, the same mathematics and modeling approaches may be applied. A common starting point involves certain unifying assumptions which allow us to reduce the dimension of the mathematical problem and to make networks more easily comparable. One important concept in this regard is that of robustness. A complex system is robust if it is able to resist environmental variations and internal failures. Thus, an aircraft is robust as it is designed in such a way that it can face an unexpected storm, and there are at least two of every control device to ensure functionality if any one should fail. Robustness is key, and often uses a large amount of space in engineered complex systems [46]. The notion of robustness is also intimately linked to that of quality control, as elegantly illustrated by the DNA replication machinery which relies on several mechanisms to ensure fidelity.
Defining complex networks leads to the next logical question – how are these networks governed? Recently, Liu et al. [47] have studied the controllability of complex networks according to the idea that a dynamic system may be defined as controllable if, with a suitable choice of inputs, it can be driven from any initial state to any desired final state within a finite time. Liu et al. developed a way to analyze and estimate the number of “driver nodes” which have to be acted upon in order to control the system. Somewhat counter-intuitively, they found that driver nodes tend to avoid hubs (i.e. that the most connected nodes are not the most important ones for control) and that the control of biological systems requires more driver nodes than non-biological ones. Their calculations show that to fully control a gene regulatory network, 80% of the nodes should be driver nodes, as compared with 20% for social networks, and even less for engineered networks. It is not clear whether the calculated figure of 80% has a precise biological meaning. However, these data strongly suggest that the nodes of biological networks globally are highly inter-dependent. Importantly, this observation fits all the experimental evidence that has been gathered in micro-organisms (Escherichia coli and yeast), invertebrates (Caenorhabditis elegans and Drosophila), and mammals (mouse and man) [44]. Encouragingly, the Drosophila protein network has been shown to be very highly connected, lending experimental support to the above theoretically defined figure of 80% [48].
It is important to draw a distinction between the individual cellular networks in question and those networks engaging a multiplicity of cells (such as those involved in development body plans, or in the immune and nervous systems), or a multiplicity of organisms (such as ecological networks), all of which exhibit extremely high complexity. Following the terminology of Doyle and Csete [49], individual cellular networks are one layer in the multi-layered architecture of the much more complex system which constitutes the entire organism. It must also be noted that the idea of individual cellular networks provides a conceptual frame of unified cell functioning which is gradually enriched by experimental data. Another layer of complexity is added by the fact that each cell type will have its own individual network; currently a few hundred different human cell types have been identified, and this number may well grow with the advent of new technologies [50]. However, all cell types share a common core of basic and housekeeping functions dealing with DNA replication, transcription, apoptosis, etc. This is observed experimentally, for example in the comparative analysis of immune memory cells and stem cells [51]. Thus, the notion of “an individual cellular network” is a conceptual aggregate which may need to be interpreted according to the particular cell type. Nevertheless, essential housekeeping functions constitute a common core, thought to involve about half of the genes expressed in any cell type.
3.2 Sensitivity of cellular networks to mutations
Robustness is a major attribute of cellular networks, reflecting the fact that they work when faced with a number of internal and/or environmental fluctuations, perturbations, or failures, including genetic and epigenetic variations. With respect to mutations, robustness implies that if a mutation alters or destroys a node (e.g. a certain gene product) or an edge (e.g. the interaction between two gene products) the network will usually remain operational. This is not only due to gene diploidy and/or gene duplication, but is also related to distributed properties of the network [52]. However, Doyle and Csete [49] emphasize the point that to be evolvable, a robust network architecture must include a number of fixed points which, upon attack, can cause the system to fail catastrophically [53]. Such fixed points that are essential for the robustness of the organism must therefore be the targets of deleterious mutations.
While these deleterious mutations are important, the role of “weak” or “hidden” mutations should not be underestimated. Mutations may be “weak” in the sense that they result in modulation rather than loss of function. Thus, certain amino acid changes do not destroy the activity of a protein, but instead weaken or increase protein interactions with other molecules. Similarly, mutations in the promoter and DNA regulatory elements of a gene most often do not abolish expression, but instead can positively or negatively modulate the product. These more subtle effects mean that weak mutations are more difficult to study experimentally. Genetic science initially developed through analyzing the effects of strong mutations that are associated with clear-cut and easily observable phenotypes. In a network, weak mutations may be hardly noticeable because their effects are buffered by the network itself. From a cellular network perspective, in order to move forward, genetic thinking must take into account the “weak” mutations instead of focusing only upon the most deleterious mutations which identify the so-called “essential” genes.
Whether considering deleterious or so called “weak” mutations, even a robust cellular network offers a large number of possible targets for mutation, including the expressed genes, the numerous DNA regulatory sequences, and those coding for regulatory RNAs. When genes encode several products, they may give rise to several nodes in the network; so let us assume that, because genes often give rise to more than one product, the number of nodes is around twice the number of genes expressed in any given cell type – in the order of 25,000. Let us further postulate that 50% of the nodes are critical (a conservative figure when compared with the 80% predicted for the controllability of the network). It would then follow that strong or weak mutations hitting any one of these 12,500 “controller” nodes (through the corresponding genes and regulatory elements) have the potential to unbalance the network. A typical example would be an alteration in the promoter region of a transcription factor which has a downstream effect on the transcription factor and therefore the network involving that factor. Being robust however, the network will buffer many genetic perturbations, and especially the weak mutations are likely to induce little phenotypically noticeable change. However, as we will explore further below, even weak mutations may shift the network away from its optimal functional state and thereby decrease its robustness.
So how does all this influence the picture within our model male gamete? Considering the above assumptions, any developing human male gamete will be subject to an average of two deleterious mutations + 50% of 10 additional weak mutations, i.e. a total of seven separate mutations. Multiple mutations are thus the rule, not the exception, and genetic epistasis (where the effects of one gene modify those of another) must be a general phenomenon. Mutational attacks involve random mutations in random combinations. An early mutation in the germ line may thus be followed by a second or third, which will possibly aggravate the unbalance but could also be compensatory. Therefore, from a cellular network perspective, the development of germ line cells, especially in the male, provides the grounds for mutational experimentation.
4 The case for an internal selection process
I have previously hinted at the importance of the link between robustness and quality control in cellular networks. If, on average, every spermatozoa bears seven functionally significant mutations, the inescapable question is whether quality control mechanisms evolved in order to eliminate defective ones. In female gametes, a similar mechanism does operate at the pro-oocyte level; the mtDNA bottleneck that takes place during oocyte development is correlated with strong purifying selection of mtDNA [18,54], preserving the integrity of the respiratory chain. However, the mechanisms by which defective mitochondria and/or their host cells are eliminated have yet to be fully deciphered.
Apoptosis provides a simple mechanistic option for the purifying selection of germ line cells with altered cellular networks. In somatic cells, apoptosis is essential for the development of the organism and also functions as the primary defense against mutations which would otherwise give rise to cancers. The apoptotic function is interconnected with many modules of cellular functions, and can be triggered by several pathways. Assuming that cellular networks of germ line cells have sensors capable of evaluating how well the network is functioning, apoptosis may have a major role to play in gamete selection. Deleterious mutations and deleterious accumulations of weak mutations probably trigger apoptosis directly, but what about weaker mutations, or combinations thereof, that alter the functioning of the network without impairing cell survival or division? I postulate that sensors exist to detect that the cell is not working optimally, and that these sensors deliver alert signals to be amplified at key checkpoints to the extent that they trigger intrinsic or extrinsic apoptosis where they previously did not. But what evidence is there to support this postulation?
4.1 Energy and nutrient sensing
Energy and nutrient supplies are essential for cell survival. They fluctuate, may be limiting, and so cellular networks are likely to be optimized with respect to their utilization and consumption. The effects of weak mutations may well be buffered by the network, but importantly, at the expense of additional energy and nutrient consumption. This then will result in an erosion of functionality and/or robustness which can be internally monitored by the cell itself, yielding the postulated signals to be amplified at defined checkpoints, and triggering apoptosis when the network operates below an optimal level.
Energy flow is key to both the organism and to any one individual cell, including germ cells. There are hundreds or more bioenergetic genes encoded in the nucleus of every cell and a few dozen in the mitochondrial genome. Interestingly, evolutionary adaptation to specific environments (e.g. climatic zones) has been demonstrated for several nuclear and mitochondrial bioenergetic genes [20]. Because bioenergetic genes are both abundant and dispersed in the genome, their regulation involves epigenetic phenomena operating at the chromatin level. Chromatin remodeling is in part regulated by histone phosphorylations and acetylations which decrease their affinity for DNA, making specific sequences available for transcription. Histone methylation also plays a role. The histone modifying enzymes respond directly to Adenosine-5′-Triphosphate (ATP), Acetyl-coenzyme A (Acetyl-CoA) and S-Adenosyl-Methionine, which are the products of the bioenergetic systems of the cell, thus intimately linking energy flow and gene transcription. The bioenergetic response to environmental fluctuations mediated by signal transduction cascades and transcription factors via high-energy intermediates. Thus most signal transduction pathways are directly or indirectly regulated by ATP-dependent phosphorylation. Considering the mechanisms that direct cell apoptosis and mitochondrial destruction, it can be concluded, following Wallace [20], that “energy flux through the animal cell regulates virtually every aspect of cellular growth, differentiation, quiescence and death”.
Certain molecules hold central positions within the sensory and integration pathways. For example in the case of the pathways regulating metabolism one such molecule is mTOR. The mammalian target of rapamycin, or mTOR, is a well-conserved serine-threonine protein kinase that integrates environmental cues from nutrients (in particular available sugar levels), as well as energy and growth factors [55,56]. Being modulated itself by an AMP kinase, mTOR is thought to serve as an ATP sensor, which allows cells to decode changes in energy status. The list of downstream targets of mTOR is expanding [57], and this key kinase has also been found to play a role in immune regulation, especially through regulatory T-cells [58,59]. A second important molecule is the class III histone deacetylase SIRT-1, which detects redox fluxes reflecting mitochondrial activity, thereby linking energy metabolism and chromatin remodeling. SIRT-1 requires NAD+ as a cofactor for histone deacetylation to induce the formation of facultative heterochromatin, resulting in the silencing of many genes. This is one of the mechanisms used by the endogenous cellular circadian clock, which is strongly correlated with cell metabolism [60]. Other nutrients, such as essential amino acids, may also play a role, as tryptophan and arginine are known to do in immune cells [61].
4.2 Buffering by Hsp90
Buffering is a property of the networks which means that the possible phenotypic effects of mutations are effectively “absorbed” and do not become evident. Mechanistically, one way that this is achieved is via the action of the heat shock family of proteins (Hsp's). The functional importance of Hsp90 in Drosophila and Arabidopsis [62] is illustrated by the fact that loss-of-function mutations or pharmacological blockade of this protein give rise to diverse phenotypic variations, revealing multiple underlying genetic variations. Hsp90 both assists client proteins (often inherently metastable) to maintain correct conformation and helps mutated or stress-denatured proteins to refold and regain functionality. Hsp90 may therefore regulate phenotypic changes. Hsp90's excess chaperone capacity buffers the effects of variants, storing them in a phenotypically silent form. When the Hsp90 reservoir is compromised, the buffering of the network is reduced and the effects of mutated variants become apparent. The importance of cryptic polymorphisms in Hsp90-buffered gene networks [63] and that of modularity and intrinsic evolvability of Hsp90-buffered changes [64] have been extensively discussed. Buffering can be either direct or indirect, depending upon whether Hsp90 acts on the damaged protein or on other members of the unbalanced network. Although highly abundant, Hsp90 chaperones may still be limiting. A competitive demand on these chaperones could trigger non-linear downstream responses, translated into thresholds and biological switches. An extensive study in yeast has concluded that Hsp90 as well as environmental stresses do indeed transform the adaptive value of natural genetic variation [65]. Since orthologous Hsp90 genes exist in mice and humans, it is believed that Hsp90 may have the same function in mammals [66]. However, the conditions for experimental verification of this statement are not met in humans and hardly in the mouse because of the large numbers of individuals needed.
Hsp90's functions are even more diverse. Recently, it was discovered that Hsp90 prevents phenotypic variation by suppressing the mutagenic activity of transposons [67]. Suppression involves the Piwi pathway [68]. The latter play a major role in protecting the genome, especially in the germ line, against the disruptive activation of internal transposons [69]. Pi RNA biogenesis has been linked to mitochondrial activity in the mouse [70]. Hsp90 function is indeed highly pleiotropic. While it is clear that Hsp90 fulfils an important buffering role in a number of ways, the relative importance of the various pathways involved remains to be evaluated, especially in higher organisms.
Do Hsp90 chaperones work in this way in the human germ line? This is a plausible but unproven hypothesis. The Hsp90 gene family encodes two and three cytoplasmic chaperones (plus organelle-specific ones) in mice and humans respectively. In the mouse, the alpha isoform is temperature inducible, while the beta isoform is considered constitutive, though super inducible by growth factors. Whether the temperature sensitivity of these heat shock proteins has anything to do with the storage of spermatozoids at a temperature of 35 °C, below core body temperature, is mere speculation at this stage but interesting nonetheless. Other heat sensitive factors, such as HSPA4 in the mouse [71], or the HSF1 transcription factors in maternal oocytes [72], do play a role in gamete formation. In summary, the buffering role of Hsp90 and the mechanisms by which Hsp90 chaperones might operate in humans are not established. Nevertheless, individual cellular networks are buffered because they are robust. In this respect, Hsp90 chaperones should not be considered as the primary buffer, but rather as regulators of buffering that are capable of shifting the balance of the network through their pleiotropic functions. Other agents such as ATP might also play comparable roles.
4.3 Checkpoints
The notion of a competitive demand on a few key products, such as ATP, cellular nutrients and molecular chaperones is important in conceiving how a checkpoint might work. A modest variation in the availability of any critical product might produce non-linear modifications that could be sensed in other parts of the network and go on to trigger apoptosis. For example, at the end of spermatozoa development, the environmental conditions might be such that ATP and/or some other high energy compound or nutrient would become locally and temporarily limiting. This would override the internal buffering ability of the network and result in a cascade of events that would end in intrinsic or extrinsic apoptosis. The involvement of the extrinsic pathway of apoptosis requires that perturbed cellular networks are sensed by neighboring cells which deliver the apoptotic signal to the defective cell. The intrinsic pathway would respond to defects such as those related to energy consumption.
There is abundant circumstantial evidence to support the existence of checkpoints during gamete development in animal models, and to some extent also in humans. For example, during Drosophila oogenesis, a metabolic checkpoint exists at vitellogenesis such that, under nutrient stress, egg chambers degenerate by apoptosis [73]. In mammals, there is a considerable body of data describing oogenesis and spermatogenesis [74,75] which suggests the existence of checkpoints during gamete development, for example during the purifying selection of mtDNA mutations in pre-oocytes mentioned above. Regarding spermatogenesis, it is noteworthy that the large number of cell divisions that lead to gamete production is associated with considerable apoptosis. Mouse mutant studies show that male germ cell development involves both the intrinsic (mitochondrial) pathway [76] and the extrinsic (Fas-mediated) pathway of apoptosis. Either exposure to or deprivation of hormones, particularly follicle-stimulating hormone and testosterone, but also oestrogens, can additionally lead to apoptosis. Elevated temperature can also result in apoptosis. There is limited evidence to suggest that some of these data also hold true in humans [75]. It is commonly thought that apoptotic processes are important not only for balancing cell proliferation and death during physiological development, but also to eliminate genetic defects that may arise during mitosis and meiosis. Here, I have proposed that they also eliminate other mutations, and combinations of mutations, that alter the functioning of cellular networks and/or erode their robustness.
4.4 Cellular specificity and selective surveillance of domestic functions
The quality control mechanism postulated here, as applied to male germ line cells, would provide a double evolutionary benefit by checking:
- • on the quality of spermatozoa to enhance fertility;
- • on the core metabolic functions that are shared by all other cellular networks, regardless of cell type.
Since these core housekeeping functions are vital for every cell in the body, the evolutionary reward might potentially be very high.
But could it be still higher? I offer two speculations which could potentially increase the evolutionary rewards of this model; both rely on the notion that additional mechanisms may enlarge the scope of surveillance by random sampling. The first speculation deals with the monitoring of shuffled mutations or polymorphisms generated by homologous recombination or gene conversion. On functional grounds, most should be silent unless they synergize in cis in a way which is not balanced by the other homologous chromosome. However, it is noteworthy that epigenetic mechanisms have the potential to haploidize chromosomal regions, allowing further monitoring of partial haploid states possibly generated at random.
The second speculation is that a number of genes not normally expressed in the germ line might be monitored as well. This situation would not be unprecedented; Autoimmune Regulator (AIRE) is known to trigger the expression of tissue-specific antigens that are not normally expressed in the thymus. This process is essential in shaping the T-cell repertoire because it allows the negative selection of T-cells that might otherwise cause autoimmunity in peripheral tissues [77]. The AIRE protein induces the expression of a large number of genes by partnering with a separate set of proteins [78], some of which probably trigger the opening of chromatin. It is interesting to note that AIRE expression in the mouse is not restricted to the thymus and can be detected during embryogenesis [79]. It could play an unexpected role during development. Whether or not AIRE is involved, the random deregulation of differentiated genes might allow the testing of essential housekeeping functions in the presence of their differentiated partners. For example, this could apply to RNA-polymerase which would be exposed to a variety of transcription factors.
These suggestions, as unconventional as they may appear, could pave the way for new experiments on the biology of human male germ line cells. In this field, much of the research in man and mouse has so far been driven by the issue of infertility. Such studies might now take advantage of stem cells, since the mouse germ cell specification pathway has been reconstituted in culture using such methods [80].
5 The selfish cellular network hypothesis
5.1 Summary and main features of the hypothesis
Almost 40 years ago, Dawkins coined the term “selfish gene” to describe the autonomous role of genes in adaptation and natural selection [81]. Recently, Goriely and Wilkie [82] have discussed the paternal age effect of mutations in terms of “Selfish spermatogonia”. They emphasize the finding that a number of mutations, especially those associated with the pathway of growth factor receptor-RAS signaling, provide mutant spermatogonia with a selective advantage [83]. Here, I use the term “selfish cellular network” to underline the autonomous behavior of cellular networks, in as much as they are endowed with the intrinsic ability to monitor their own performance. This capability opens the way to the influence of internal and external selective forces, which are translated into intrinsic or extrinsic apoptosis or instead into cell survival, depending upon the performance of the cellular network.
The hypothesis involves a few basic assumptions which are summarized in Box 1 and has a number of distinctive characteristics. Firstly, the inclusion of 10 “weak” mutations within the mutational landscape of spermatozoa precursors must lead us to consider them as multiple – mutants, not single mutants. Secondly, the functional behavior of these multiple mutants must be appreciated in the context of cellular networks. This provides a conceptual framework which is “multi-epistatic” in nature, and requires that the classical notions of mutation strength and dominance are re-visited to some extent. It is then important to emphasize that a selective process operating on a cell which bears multiple mutations in a network does not only provide purifying selection. This also allows innovative evolution by potential improvements and/or additions to function. In general terms, there is no contradiction between the robustness and the evolvability of a network [84], and at the protein level, structural modularity and robustness are positively associated with evolvability [85,86]. Furthermore, Doyle and Csete [49] convincingly make the argument that complexity and evolvability are driven more by robustness rather than by minimal functionality.
The human male germ line could therefore be seen as an experimental playground for mutations. This is of particular interest for the evolution of gene regulation through transcription factors [87] or other regulatory elements. In fact, there is evidence that human genes are currently not operating at their maximum capacity in terms of gene regulation [88]. As an example, it is to be expected that single nucleotide polymorphisms (SNPs) that modulate the expression of a single gene may turn out to be paired with other SNPs modulating the expression of counter-balancing genes. A number of SNPs might then be organized in sub-networks reflecting the accumulation of such compensatory mutations. There is recent evidence to support this view [89]. Interestingly, Scheinfeldt et al. [90] have detected clusters of adaptive evolution in the human genome, with evidence that some might respond to the same positive selective pressure. This may be important to better trace and understand human disease-genome associations, which would somehow escape filtering at the postulated checkpoints, as other deleterious mutations do.
The hypothesis does not exclude other mechanisms of internal selection of spermatozoa, such as those demonstrated by Goriely and Wilkie [82] for certain human mutant spermatogonia. In the mouse, there is evidence for competition between spermatozoa either during development or at the fertilization stage [91]. Although pertaining to the same global idea of selective gametogenesis, these processes are distinct from the selfish cellular network hypothesis, which involves no direct competition but instead an internal (selfish) critical check. After fertilization, other major internal purifying selection processes take place during embryonic growth, and result in fetal abortion. The latter mechanisms serve to test essential features of the previously unconfronted maternal and paternal genomes, including the ploidy status. Part of their efficiency relies on the utilization, for developmental purposes, of genes, which have a different essential function later in life.
5.2 Generality, limits and consequences of the hypothesis
So far, I have mostly discussed human male germ line cells. Could the hypothesis apply to human female germ cells as well? The aforementioned mechanism of mtDNA purifying selection might perhaps regulate the status of pre-oocytes’ cellular networks, especially if operating through energy sensing. The much smaller number of cell divisions (24) leading to the oocyte makes it less likely that pre-oocytes routinely bear multiple mutations. However, not all mutations are related to DNA replication and cell division. Whole genome sequencing data will need to be extended to assess the average number of mutations that affect the female germine [24].
What about other less complex multi-cellular organisms such as Drosophila? The number of germ line cell divisions in the fly is 36 [92]. Contrary to humans (and mice to a lesser extent), there is no major difference in the number of cell divisions involved in the manufacturing of male and female gametes since no sex bias has been reported. The 165 million base pair genome of Drosophila contains some 14,000 genes with relatively little presumptive “junk” DNA (less than 50%) [93]. The ‘per generation’ base substitution mutation rate has been measured in the range of 5.8 × 10−9 [94] to 3.5 × 10−9 [95], implying that each gamete of the fly carries on average 0.5 to 1 de novo mutation. About half of these are thought to be deleterious. Functional screens have also been performed [96] and considerable progress has been made in terms of systems biology of the fly (modENCODE consortium 2010) [48,97].
Drosophila gametes thus mostly bear zero or only one de novo mutation, and rarely two. However, there may be rare explosions of mutations such as occur when transposons become highly mutagenic (cf. the above discussion on Hsp90 and the Piwi pathway), or when the organism accidentally acquires a mutator genotype (also suggested in mammals [86]), or in response to stress (as documented in E. coli [98]). As outlined above, the mutation rate would also have been underestimated if the shuffling of polymorphisms by homologous recombination and/or gene conversion in the germ line gave rise to new combinations of polymorphisms, which would somehow be detected as mutants and subjected to selection. Nevertheless, it seems probable that Drosophila gametes are usually not multiple mutants. This does not preclude the possibility that Drosophila has an internal quality control mechanism that checks on (single) mutants in germ line cells.
The overall picture which emerges is the following: Leaving aside the purifying selection processes involved in fetal growth and abortion, the natural selection of relatively complex multi-cellular organisms would take place at two levels, in two distinct environments. The first level is that of the germ line, which may display a sexual bias (in man and mouse), or perhaps not, depending on the species (Drosophila). A dominant mechanism of this selective gametogenesis lies in the selfish character of cellular networks. These networks are able to sense and trigger selection against the mutational erosion of their functionality and/or robustness, based on relatively gross criteria, such as energy or nutrient utilization and consumption. The second major step is the classical environmental exposure resulting in natural selection, which operates on a mutational background, which has been internally pre-filtered during the first step. The two levels of selection are specialized to some extent. Selective gametogenesis is focused on the basic essential functions of the cell, while environmental selection checks the fitness of the organism as a whole.
The existence of different selection processes operating on distinct parts of the multi-layered architecture of complex organisms makes sense. The importance and relative weight of the first layer (internal selection at the level of selfish cellular networks) are expected to vary with the complexity, body size and population size of the organism. For large and complex organisms like man, the number of germ line cell divisions is high, at least in the male, thus balancing the relatively modest population size. The outcome is that multiple de novo mutations are confronted by selective pressure in individual cells of the germ line, before being confronted with the mutations contributed by the other parent and exposed to environment-driven natural selection. For less complex organisms like Drosophila, the number of germ line cell divisions and the genome size are smaller, which is balanced by larger population size. Nevertheless, the occurrence of multiple mutations in germ line cells is rarer, and the major confrontation of mutations takes place after mating, in the selective environment.
In the human species, if the sex bias is as important as believed, the mother would tend to be the repository of previous selection events, while the father would be the major source of genetic defects and innovations focused on essential domestic functions. This asymmetry might make evolutionary sense. However, the major point is not that isolated mutations take place, but that cells bearing multiple mutations are likely to emerge, vastly broadening the genetic experimentation in individual cells. The inclusion of selfish cellular networks into the usual evolutionary picture leads us to think that evolution of the most complex organisms, especially that of man, might be faster than previously appreciated, since considerable time is saved by selective gametogenesis, which is very rapid when compared to the average generation time of 25 years. In this respect, classical notions such as that of effective population size [7] may have to be re-visited. The “selfish cellular network” hypothesis focuses attention upon cellular function more than the “selfish gene” vision, but these theories are not mutually exclusive. Nor does the concept of the “selfish cellular network” contradict any of the basic principles of current theories of evolution. It just adds a piece to them.
Acknowledgments
I am extremely grateful to Jean-Louis Mandel, Norman Pavelka, Alain Prochiantz, and Luis Quintana Murci for productive discussions and very constructive suggestions during the elaboration of this manuscript. I also wish to thank Paola Castagnoli and Olaf Rotzschke for criticizing it and Gisele Le Cabellec for her skilful help in editing it. I would also like to acknowledge Lucy Robinson and Neil McCarthy of Insight Editing London for their assistance in manuscript editing.