Living systems require the cooperation between nucleic acids and proteins, as shown by RNA-directed protein synthesis and by polymerase-catalysed nucleic acid amplification [1,2]. Reducing the effects of biochemical errors in the corresponding primordial systems has often been suggested as the major force shaping the genetic code [3–7].
Further progress on the understanding of why the genetic code is the way it is may result from the consideration of empirical rules that fit with models to be found. Rumer identified, for example, transversions that, when applied to all three bases of codons, exchange sets of four codons for which the third base does not have to be defined so as to specify an amino acid into sets of four codons for which the third base is necessary to identify unambiguously the codons' assignments [8,9]. More precisely, the set of 64 codons can be divided into a set of 32 codons noted group (IV), whose third base does not have to be specified so as to define an amino acid. For the other 32 codons, noted group (II), three bases have to be specified so as to define unambiguously an amino acid or a stop signal. Rumer observed that there exists a unique symmetry exchanging group (IV) into group (II) that substitutes G into T, T into G, C into A or A into C and which is applied to all three codon bases (Figs. 1–3). These transversions are represented by the symbol o and by its corresponding central symmetry in the standard representation of the genetic code (Fig. 2). These same transversions altering degeneracy are represented by a horizontal line in Rumer's representation of the genetic code (Fig. 3) [8].
Another unique symmetry is described here: it exchanges each group into itself by substitution of G into C, C into G, A into T or T into A and is applied to the first codon base. In the standard representation of the genetic code, it is represented as an axial symmetry: the lines noted exchanges sets of four codons of lines (A) into sets of four codons of lines (C); the lines noted exchanges sets of four codons of lines (B) into sets of four codons of lines (D) (Fig. 2). This symmetry is represented by oblique lines in Rumer's representation of the genetic code (Fig. 3). These transversions, together with the transversions noted by Rumer, represent all possible transversions.
The transitions are known to be synonymous substitutions when applied to the third codon base in the vertebrate mitochondrial genetic code for example (Fig. 2). The two types of transversions together with the transitions represent the complete set of possible base substitutions (Fig. 4). Each type of substitutions either leaves invariant or alters degeneracy depending on whether they are applied to the first base of codons, to the third base of codons or to all three bases of codons (Fig. 4).
These symmetries are clearly independent of the representations of the genetic code (Figs. 2 and 3) and can therefore be considered as intrinsic properties of the genetic code. Given a codon assigned to a group ((IV) or (II)), the three types of base substitutions (i.e. the transitions applied to the third base of codons and the two types of transversions applied to the first base of codons or to the three codon bases as indicated above) define three other codons whose group assignments are then predicted by the symmetries.
In the standard genetic code, three amino acids Leu, Arg and Ser are encoded by six codons, which can be considered each as two sets of synonymous codons, one belonging to group (IV) and one belonging to group (II). The vertebrate mitochondrial genetic code has been chosen here as an example (Figs. 2–4), as neither amino acids nor stop signals are coded by one codon or by three codons. This is not the case of the standard genetic code, as tryptophan and methionine are encoded by a single codon and as stop signals are encoded by three codons (Fig. 1). These differences in degeneracy for a few amino acids and for the stop codons between a mitochondrial code and the standard genetic code do not change the symmetries as they depend only on whether the third codon base has to be defined for unambiguous assignment of an amino acid or of the stop signal.
The number of amino acids coded within the genetic code is quasi-universal and varies among the various known genetic codes: their diversity has been described by evolutionary models, which also account for the introduction of further amino acids such as selenocysteine within the genetic code [10,11]. Whether mitochondrial codes should be considered as ancestral codes or as codes that evolved during symbiosis remains unclear. This diversity of codes is often associated to start signals as well as to stop signals. The codon assignment of stop signals was shown to optimise the tolerance of polymerase-induced frameshift mutations, i.e. to minimise the deleterious effects of single-base deletions catalysed by nucleic acid polymerases [12]. Accordingly, it should be of interest to verify experimentally whether the two different doublets coding for the stop signal in the vertebrate mitochondrial genetic code are sequence ‘hot-spots’ for single-base deletions catalysed by DNA polymerases in vertebrate mitochondria. A diversity of genetic codes was also created experimentally [13,14]. In this case, the diversity of genetic codes is not necessarily associated to start or to stop codons [13]. If we consider only the natural diversity of genetic codes, then the symmetries of degeneracy by base substitutions are almost universal. Whether the symmetries can be related to universal biochemical biases such as those found in genes, in genomes or in proteins [15,16] remains to be established. The absence of obvious links between biochemical properties and the transversions altering or leaving invariant degeneracy in the genetic code suggests that these symmetries may derive from physical constraints linked to intrinsic properties of the code: a model accounting for the symmetries by base substitutions of degeneracy in the genetic code is investigated [17].
Acknowledgements
The author thanks Vladimir Shcherbak for a discussion, Pierre-Étienne Bost and Tam Huynh-Dinh for critical comments and Anastassia Komarova for translations.