## 1 Introduction

Amino acids and the backbone of DNA/RNA in living things on Earth are found in only one of the two possible mirror-image states available to them. Respectively, the l-forms of amino acids primarily serve as the building blocks of proteins, and d-sugars form the DNA/RNA backbone [1]. Attempts to replicate early conditions on Earth – Miller/Urey experiments – always produce ‘racemic’ mixtures having equal amounts of both possible amino acid symmetry forms. This conundrum was recently addressed by Gleiser et al. [2], in a computational intensive study adapting Sandars’ ‘toy model’ of polymerization [3]. They conclude that other planetary platforms in this solar system and elsewhere could have developed an opposite chiral bias to that of Earth. As a consequence, they assert, a statistically large sampling of extraterrestrial stereochemistry would be necessarily racemic on average.

This is not, of course, a new idea, although without the grace of a mathematical model (e.g., Wald, [4], as quoted in [5]):

If the choice of optical isomers is as arbitrary as proposed, one should expect that a survey of life throughout the universe would reveal approximately equal numbers of planetary populations in which the choice of metabolically connected series of dissymetric molecules came out l or d-: roughly equal numbers in which life is base on l- and on d-amino acids, and similarly, for the other molecules.

Here we will attempt a direct treatment based on the homology between free energy density and information source uncertainty that was the basis of the analysis by Wallace and Wallace in [6]. We argue, via a statistical thermodynamic construction, that available metabolic energy could well have been the principal determining environmental influence, and that, as a consequence of groupoid symmetries associated with stereochemical structure, a statistically large sampling of extraterrestrial stereochemistries could well be far more complex than Gleiser et al. and Wald proposed, i.e., not necessarily racemic on average.

The development is straightforward and involves several basic ideas:

- (1) Reproducing molecular codes, in the largest sense, constitute information sources that are themselves Darwinian individuals, subject to variation, selection, and chance extinction in the sense of [7];
- (2) Enantiomeric forms of molecules constitute equivalence classes that can be represented by groupoid, rather than group, symmetries, leading to a groupoid version of Landau's classic phenomenological model for phase transition and its extension via Pettini's ‘topological hypothesis’ [8]. The necessity of using groupoid methods in stereochemistry has long been recognized [9–16], and will not be reviewed here. For a tutorial on groupoid methods see [17,18]. The Mathematical Appendix here provides a short summary;
- (3) Groupoid symmetries and available metabolic free energy are, as a consequence of the Darwinian individuality of coding schemes, contexts for, rather than determinants of, the resulting evolutionary processes, including punctuated equilibrium. That is, they define the banks between which the evolutionary glacier flows -- sometimes slowly, sometimes in a sudden avalanche.

This will suggest the possibility of very complicated symmetry schemes within astrobiology.

## 2 Information and reproduction

One current in contemporary theoretical biology [19–21] argues that, for modern organisms, genomic complexity fits within standard information theory as the information the genome of an organism contains about its environment, so that evolution on the molecular level is a collection of information transmission channels, subject to certain constraints defined by the asymptotic limit theorems of information theory. The organism's genes code for the information, a message, to be transmitted from progenitor to offspring, and are subject to noise from an imperfect reproduction process. Thus the information content, or complexity, of a genomic string by itself, without referring to the embedding environment, is a meaningless concept, and a change in environment leads to a change in complexity. The transmission of reproductive information is thus a contextual matter involving the operation of an information source that must interact with embedding ecosystem structures. Here we will focus on the role of available metabolic free energy density as the main driving environmental factor.

Reproduction -- biotic or prebiotic -- is thus to be characterized by an information source, whose source uncertainty has an important heuristic interpretation. Ash [22] puts it this way:

… [W]e may regard a portion of text in a particular language as being produced by an information source. The probabilities P[X_{n} = a_{n}|X_{0} = a_{0}, …, X_{n−1} = a_{n−1}] may be estimated from the available data about the language; in this way we can estimate the uncertainty associated with the language. A large uncertainty means, by the [Shannon-McMillan Theorem], a large number of ‘meaningful’ sequences. Thus given two languages with uncertainties H_{1} and H_{2} respectively, if H_{1} > H_{2}, then in the absence of noise it is easier to communicate in the first language; more can be said in the same amount of time. On the other hand, it will be easier to reconstruct a scrambled portion of text in the second language, since fewer of the possible sequences of length n are meaningful.

Thus, depending on the degree of noise, either high or low reproductive source uncertainty can have selective advantage, a kind of stochastic resonance related to the mesoscale resonance arguments of [23].

## 3 Free energy density and information source uncertainty

Information source uncertainty can be defined in several ways. Khinchin [24] describes the fundamental ‘E-property’ of a stationary, ergodic information source as the ability, in the limit of infinity long output, to classify strings into two sets:

- (1) a very large collection of gibberish which does not conform to underlying rules of grammar and syntax, in a large sense, and which has near-zero probability; and
- (2) a relatively small ‘meaningful’ set, in conformity with underlying structural rules, having very high probability.

The essential content of the Shannon-McMillan Theorem is that, if N(n) is the number of ‘meaningful’ strings of length n, then the uncertainty of an information source X can be defined as

$$H[X]=\underset{n\to \infty}{{\displaystyle \mathrm{lim}}}\mathrm{log}[N(n)]/n$$ |

$$=\underset{n\to \infty}{{\displaystyle \mathrm{lim}}}H({X}_{n}|{X}_{0},\dots ,{X}_{n-1})$$ | (1) |

$$=\underset{n=\to \infty}{{\displaystyle \mathrm{lim}}}\frac{H({X}_{0},\dots ,{X}_{n})}{n+1}\text{,}$$ |

The free energy density of a physical system having volume V and partition function Z(T) derived from the system's Hamiltonian at (normalized) temperature T is [25]

$$Z(T,V)=\sum _{j}\mathrm{exp}[-{E}_{j}/T]$$ |

$$F[T]=\underset{V\to \infty}{{\displaystyle \mathrm{lim}}}-T\frac{\mathrm{log}[Z(T,V)]}{V}$$ | (2) |

$$=\underset{V\to \infty}{{\displaystyle \mathrm{lim}}}\frac{\mathrm{log}[\stackrel{\u02c6}{Z}(T,V)]}{V}\text{,}$$ |

_{j}(V).

Feynman [26], following arguments by Bennett, concludes that the information contained in a message is simply the free energy needed to erase it, and describes a simple ideal machine that can turn the information in a message into useful work. Thus, according to this argument, source uncertainty is homologous to free energy density as defined above.

Ash's comment [22], quoted above, then has a corollary: If, for a biological system, H_{1} > H_{2} source 1 will require more metabolic free energy for ongoing maintenance than source 2.

## 4 The basic model

We begin by classifying the available molecules in our prebiotic soup by their underlying stereochemistries, and allow the reproductive systems to, for purposes of initial classification, reflect those stereochemical equivalence classes. Interactions between stereochemical equivalence classes can be used to classify higher order structures.

Equivalence classes define groupoids, by standard mechanisms [17,18,27,28], as described in the Appendix. The basic equivalence classes will define transitive groupoids, and higher order systems can be constructed by the union of transitive groupoids, having larger chemical alphabets that allow more complicated statements in the sense of Ash above.

Given an appropriately scaled, dimensionless, fixed, available metabolic energy density K, we propose that the metabolic-energy-constrained probability of a reproductive information source representing stereochemical equivalence class D_{i}, i.e., ${H}_{{D}_{i}}$, will be given by the classic relation [25]:

$$P[{H}_{{D}_{i}}]=\frac{\mathrm{exp}[-{H}_{{D}_{i}}/K]}{[\sum _{j}\mathrm{exp}[-{H}_{{D}_{j}}/K]]}\text{,}$$ | (3) |

If we make a standard simplified approximation and, for the moment, replace sums by integrals, then an elementary calculation shows

$$\u2329\hspace{0.17em}H\hspace{0.17em}\u232a=\sum {H}_{{D}_{i}}P[{H}_{{D}_{i}}]\approx \frac{{\int}_{H}H\mathrm{exp}[-H/K]\text{d}H}{{\int}_{H}\mathrm{exp}[-H/K]\text{d}H}=K\text{.}$$ | (4) |

More exactly, let

$${Z}_{D}[K]\equiv \sum _{j}\mathrm{exp}[-{H}_{{D}_{j}}/K]\text{.}$$ | (5) |

We now define the Groupoid free energy of the system, F_{D}, at normalized metabolic energy density K, as

$$\mathrm{exp}[-{F}_{D}/K]\equiv \sum _{j}\mathrm{exp}[-{H}_{{D}_{j}}/K],\text{\hspace{0.28em}}\text{i.e.}$$ |

$${F}_{D}[K]=-K\mathrm{log}[{Z}_{D}[K]]\text{.}$$ | (6) |

We have expressed the probability of a reproductive information source in terms of its relation to a fixed, scaled, available metabolic free energy density seen as a kind of equivalent system temperature. This gives a statistical thermodynamic path leading to definition of a ‘higher’ free energy construct – F_{D}[K] – to which we now apply Landau's fundamental heuristic phase transition argument [8,25,29]. See, in particular, Pettini [8] for details.

The essence of Landau's insight was that second order phase transitions were usually in the context of a significant symmetry change in the physical states of a system, with one phase being far more symmetric than the other. A symmetry is lost in the transition, a phenomenon called spontaneous symmetry breaking. The greatest possible set of symmetries in a physical system is that of the Hamiltonian describing its energy states. Usually states accessible at lower temperatures will lack the symmetries available at higher temperatures, so that the lower temperature phase is less symmetric: The randomization of higher temperatures -- in this case higher available metabolic free energy densities -- ensures that higher symmetry/energy states -- mixed transitive groupoid structures -- will then be accessible to the system. Absent high metabolic free energy densities, however, only the simplest transitive groupoid structures can be manifest, i.e., those associated with the simplest stereochemistries. A full treatment from this perspective requires invocation of groupoid representations, no small matter (e.g. [30,31]).

Most deeply, however, an extended version of Pettini's Morse-Theory-based topological hypothesis can now be invoked [8], i.e., that changes in underlying groupoid structure are a necessary (but not sufficient) consequence of phase changes in F_{D}[K]. Necessity, but not sufficiency, is important, as it allows mixed symmetries, e.g., l-forms of amino acids working in concert with the d-sugar DNA/RNA backbone.

It is important to note, although we do not pursue the matter here, that F_{D}[K] is only one of a broad spectrum of possible Morse functions. In particular, for interacting sets of information sources, network information theory [32,33] provides a tool for identifying splitting criteria that depend in detail on the structure of interaction. These too, as Morse functions on groupoid structures, lead to our basic results.

Given an appropriate Morse function, the biological renormalization schemes of the Appendix to [6] may now be imposed, providing a spectrum of highly punctuated transitions in the overall system of reproductive information sources: punctuated equilibrium writ large.

## 5 Evolutionary selection of stereochemistry

The essential point is that the reproductive chemical strategies represented by the ${H}_{{D}_{j}}$ are not merely passive actors. Quite the contrary, they are full-scale Darwinian individuals in the sense of [7], and thus subject to variation, selection, and chance extirpation. Thus, given sufficient initial metabolic energy density, there is no inherent reason why higher order, non-transitive, groupoid reproductive chemical systems -- of mixed chirality -- might not prevail, particularly in view of the Ash quotation above. That is, one can ‘say’ more in a shorter time using a richer reproductive language, and this might well have selective value. Thus we may, if this model is correct, expect to observe some surprising astrobiological reproductive stereochemistries, in contrast to the simple ‘racemic’ conclusion of Gleiser et al. [2].

The corollary to this argument is that initial preaerobic metabolic free energy density on Earth may just not have been sufficient to activate non-homochiral reproductive chemistries, and that the two possible amino acid systems, L,D, engaged in a Darwinian competition through which one prevailed, as argued by [5]. Subsequent path-dependent evolutionary lock-in produced the ultimate result.

Again, groupoid symmetries and available metabolic free energy are, as a consequence of the Darwinian individuality of reproductive coding schemes, contexts for, rather than determinants of, evolutionary process, including punctuated equilibrium. They are the banks between which the prebiotic evolutionary glacier flowed – sometimes slowly, and sometimes in sudden advance. Thus astrobiological outcomes at adequately high free energy may be complicated indeed, and the distribution across a statistically large sampling of extraterrestrial stereochemistriy need not be racemic, on average.

## Conflict of interest statement

The author declares that there is no conflict of interest.

## Appendix A Groupoids

**Basic ideas**

Following [18], a groupoid, G, is defined by a base set A upon which some mapping – a morphism – can be defined. Note that not all possible pairs of states (a_{j},a_{k}) in the base set A can be connected by such a morphism. Those that can define the groupoid element, a morphism g = (a_{j},a_{k}) having the natural inverse g^{−1} = (a_{k},a_{j}). Given such a pairing, it is possible to define ‘natural’ end-point maps α(g) = a_{j}, β(g) = a_{k} from the set of morphisms G into A, and a formally associative product in the groupoid g_{1}g_{2} provided α(g_{1}g_{2}) = α(g_{1}), β(g_{1}g_{2}) = β(g_{2}), and β(g_{1}) = α(g_{2}). Then the product is defined, and associative, (g_{1}g_{2})g_{3} = g_{1}(g_{2}g_{3}).

In addition, there are natural left and right identity elements λ_{g}, ρ_{g} such that λ_{g}g = g = gρ_{g} [18].

An orbit of the groupoid G over A is an equivalence class for the relation a_{j} ∼ Ga_{k} if and only if there is a groupoid element g with α(g) = a_{j} and β(g) = a_{k}. Following [27], we note that a groupoid is called transitive if it has just one orbit. The transitive groupoids are the building blocks of groupoids in that there is a natural decomposition of the base space of a general groupoid into orbits. Over each orbit there is a transitive groupoid, and the disjoint union of these transitive groupoids is the original groupoid. Conversely, the disjoint union of groupoids is itself a groupoid.

The isotropy group of a ∈ X consists of those g in G with α(g) = a = β(g). These groups prove fundamental to classifying groupoids.

If G is any groupoid over A, the map (α, β) : G → A × A is a morphism from G to the pair groupoid of A. The image of (α, β) is the orbit equivalence relation ∼G, and the functional kernel is the union of the isotropy groups. If f : X → Y is a function, then the kernel of f, ker(f) = [(x_{1}, x_{2}) ∈ X × X : f(x_{1}) = f(x_{2})] defines an equivalence relation.

Groupoids may have additional structure. As [18] explains, a groupoid G is a topological groupoid over a base space X if G and X are topological spaces and α, β and multiplication are continuous maps. A criticism sometimes applied to groupoid theory is that their classification up to isomorphism is nothing other than the classification of equivalence relations via the orbit equivalence relation and groups via the isotropy groups. The imposition of a compatible topological structure produces a nontrivial interaction between the two structures. Below we will introduce a metric structure on manifolds of related information sources, producing such interaction.

In essence, a groupoid is a category in which all morphisms have an inverse, here defined in terms of connection to a base point by a meaningful path of an information source dual to a cognitive process.

As Weinstein [18] points out, the morphism (α, β) suggests another way of looking at groupoids. A groupoid over A identifies not only which elements of A are equivalent to one another (isomorphic), but it also parametizes the different ways (isomorphisms) in which two elements can be equivalent, i.e., all possible information sources dual to some cognitive process. Given the information theoretic characterization of cognition presented above, this produces a full modular cognitive network in a highly natural manner.

Brown [17] describes the fundamental structure as follows:

A groupoid should be thought of as a group with many objects, or with many identities… A groupoid with one object is essentially just a group. So the notion of groupoid is an extension of that of groups. It gives an additional convenience, flexibility and range of applications…

Example 1: A disjoint union [of groups] G = ∪ _{λ}G_{λ}, λ ∈ Λ, is a groupoid: the product ab is defined if and only if a,b belong to the same G_{λ}, and ab is then just the product in the group G_{λ}. There is an identity 1_{λ} for each λ ∈ Λ. The maps α, β coincide and map G_{λ} to λ, λ ∈ Λ.

Example 2: An equivalence relation R on [a set] X becomes a groupoid with α, β : R → X the two projections, and product (x, y)(y, z) = (x, z) whenever (x, y), (y, z) ∈ R. There is an identity, namely (x,x), for each x ∈ X…

Weinstein [18] makes the following fundamental point:

Almost every interesting equivalence relation on a space _{B} arises in a natural way as the orbit equivalence relation of some groupoid G over B. Instead of dealing directly with the orbit space B/G as an object in the category S_{map} of sets and mappings, one should consider instead the groupoid G itself as an object in the category G_{htp} of groupoids and homotopy classes of morphisms.

The groupoid approach has become quite popular in the study of networks of coupled dynamical systems which can be defined by differential equation models [28].

**Global and local symmetry groupoids**

Here we follow [18] fairly closely, using his example of a finite tiling.

Consider a tiling of the Euclidean plane R^{2} by identical 2 by 1 rectangles, specified by the set X (one dimensional) where the grout between tiles is X = H ∪ V, having H = R × Z and V = 2Z × R, where R is the set of real numbers and Z the integers. Call each connected component of R^{2}\X, that is, the complement of the two dimensional real plane intersecting X, a tile.

Let Γ be the group of those rigid motions of R^{2} which leave X invariant, i.e., the normal subgroup of translations by elements of the lattice Λ = H ∩ V = 2Z × Z (corresponding to corner points of the tiles), together with reflections through each of the points 1/2Λ = Z × 1/2Z, and across the horizontal and vertical lines through those points. As noted by Weinstein [18], much is lost in this coarse-graining, in particular the same symmetry group would arise if we replaced X entirely by the lattice Λ of corner points. Γ retains no information about the local structure of the tiled plane. In the case of a real tiling, restricted to the finite set B = [0, 2m] × [0, n] the symmetry group shrinks drastically: The subgroup leaving X ∩ B invariant contains just four elements even though a repetitive pattern is clearly visible. A two-stage groupoid approach recovers the lost structure.

We define the transformation groupoid of the action of Γ on R^{2} to be the set

$$G(\Gamma ,{R}^{2})=\{(x,\gamma ,y|x\in {R}^{2},y\in {R}^{2},\gamma \in \Gamma ,x=\gamma y\}\text{,}$$ |

$$(x,\gamma ,y)(y,\nu ,z)=(x,\gamma \nu ,z)$$ |

Here α(x, γ, y) = x, and β(x, γ, y) = y, and the inverses are natural.

We can form the restriction of G to B (or any other subset of R^{2}) by defining

$$G(\Gamma ,{R}^{2}){|}_{B}=\{g\in G(\Gamma ,{R}^{2})|\alpha (g),\beta (g)\in B\}$$ |

(1) An orbit of the groupoid G over B is an equivalence class for the relation x ∼ _{G}y if and only if there is a groupoid element g with α(g) = x and β(g) = y.

Two points are in the same orbit if they are similarly placed within their tiles or within the grout pattern.

(2) The isotropy group of x ∈ B consists of those g in G with α(g) = x = β(g). It is trivial for every point except those in 1/2Λ ∩ B, for which it is Z_{2} × Z_{2}, the direct product of integers modulo two with itself.

By contrast, embedding the tiled structure within a larger context permits definition of a much richer structure, i.e., the identification of local symmetries.

We construct a second groupoid as follows. Consider the plane R^{2} as being decomposed as the disjoint union of P_{1} = B ∩ X (the grout), P_{2} = B \ P_{1} (the complement of P_{1} in B, which is the tiles), and P_{3} = R^{2} \ B (the exterior of the tiled room). Let E be the group of all euclidean motions of the plane, and define the local symmetry groupoid G_{loc} as the set of triples (x, γ, y) in B × E × B for which x = γy, and for which y has a neighborhood U in R^{2} such that γ(U ∩ P_{i}) ⊆ P_{i} for i = 1,2,3. The composition is given by the same formula as for G(Γ, R^{2}).

For this groupoid-in-context there are only a finite number of orbits:

$${O}_{1}=\text{interior}\text{\hspace{0.17em}}\text{points}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{tiles};$$ |

$${O}_{2}=\text{interior}\text{\hspace{0.17em}}\text{edges}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{tiles};$$ |

$${O}_{3}=\text{interior}\text{\hspace{0.17em}}\text{crossing}\text{\hspace{0.17em}}\text{points}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{grout};$$ |

$${O}_{4}=\text{exterior}\text{\hspace{0.17em}}\text{boundary}\text{\hspace{0.17em}}\text{edge}\text{\hspace{0.17em}}\text{points}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{tile}\text{\hspace{0.17em}}\text{grout};$$ |

$${O}_{5}=\text{boundary}\text{\hspace{0.17em}}\text{'T'}\text{\hspace{0.17em}}\text{points};$$ |

$${O}_{6}=\text{boundary}\text{\hspace{0.17em}}\text{corner}\text{\hspace{0.17em}}\text{points.}$$ |

The isotropy group structure is, however, now very rich indeed:

The isotropy group of a point in O_{1} is now isomorphic to the entire rotation group O_{2};

It is Z_{2} × Z_{2} for O_{2}.

For O_{3}, it is the eight-element dihedral group D_{4};

For O_{4}, O_{5} and O_{6}, it is simply Z_{2}

These are the ‘local symmetries’ of the tile-in-context.