1 Introduction
Biological cells are autocatalytic networks [1] and several of their salient characteristics follow from this. For example, they take up nutrients and perform chemical reactions so as to gain mass, and then divide to form daughter cells. The evolution of autocatalytic networks within a self-contained system of artificial chemistry has been observed [2] and, in this context, the importance of simulating division has been shown [3]. We have proposed that the regulation of the bacterial cell cycle depends on cells sensing the state of their metabolic networks [4–6]. Such networks have characteristic patterns of connectivity [7–9]. It is therefore of interest to study how these patterns may vary during the cell cycle to confer for example rapid growth (to profit from the availability of nutrients) or robustness (to allow survival during starvation). Our initial objective, reported here, is to begin to model the formation and evolution of autocatalytic networks and their relationship with cell division. We have therefore written a program that can be approximated to a simulation of a cell that is fed by monomers that are the ‘energy’ source for the system. In this simulation, the monomers are labelled from 1 to n. Different numbers of these monomers can be assembled into linear (non-branched) polymers to give different lengths. A polymer may be cleaved or added to another polymer or monomer in a reaction in which the order and total number of monomers are conserved. A reaction is catalysed by a particular polymer or ‘enzyme’ that may itself be a reactant of that reaction (autocatalysis). More than one variety of enzymes may separately catalyse the same reaction; a single variety of enzyme may catalyse more than one reaction; some polymers do not catalyse reactions. These reactions are only studied within the confines of the cell. The initial cell is created by the self-association of a random number of each monomer and a random number of a random selection of polymers formed outside the cell. The cell is then supplied with monomeric nutrients at regular or intermittent intervals. The cell is also supplied with polymers but at a rate much lower than the rate of supply of nutrients. There is a flux of material through the cell, since monomers and polymers may be lost from the cell (note that this facility is not used in the version described below). The dynamics of the system is described by representations of its state at discrete time steps. At each time step, a nutrient may or may not be incorporated into the cell depending on the availability of the nutrients outside the cell. At each time step, the cell is modified by calculating, on the basis of concentrations, whether each variety of enzyme catalyses its cognate reactions; this is done so as to give some physicochemical reality. Each variety of enzymes is examined. This results in changes in the numbers and types of monomers and polymers present in the cell. The time step is repeated until the mass of polymers in the cell reaches a threshold (corresponding to the size at which cell division would occur) and the cell is then analysed in terms of the number and nature of its polymers, reactions and their connectivity.
2 The model
(1) We consider a set of monomeric molecules (monomers) of different nature labelled from 1 to n.
(2) These monomers are ‘nutrients’ that are present outside a ‘reaction chamber’ or ‘cell’.
(3) Different numbers PD (for Polymerisation Degree) of these monomers can be assembled in linear (non-branched) polymeric molecules (polymers) to give different lengths. The symbol Pj({k}) represents a polymer containing PD=j monomeric units and {k} is an ordered set of j symbols, each symbol being an element of the set of n monomeric units defined above. For example, if the polymer Pj({k}) is the string 23112, then j=5 and {k}={2,3,1,1,2}. Polymers containing PD=j monomeric units therefore exist in a maximum of nj different permutations.
(4) A polymer containing PD=p monomeric units may be cleaved or added to other polymers (PD=q) or monomers (PD=1) such that the order and total number of monomeric units is conserved in the reaction. Reactions between molecules are of the form, Pp({k})⊕Pq({l})↔Pp+q({k}{l}) where {k} and {l} are ordered sets of the monomeric units and ({k}{l}) is the result of the addition of these sets in the order k to l. It is therefore the equivalent of reactions of addition (left to right) or cleavage (right to left). This reaction is reversible but is not commutative, i.e., Pp({k})⊕Pq({l}) is not the same as Pq({l})⊕Pp({k}) and Pq({l})⊕Pp({k})↔Pp+q({k}{l}) is not a valid reaction.
(5) Reactions are catalysed by a particular polymer Pp({k}) that may itself be a reactant (autocatalysis) and that we term ‘enzyme’ for convenience. More than one variety of enzymes, e.g., Pp({k}) and Pq({l}), may separately catalyse the same reaction. A single variety of enzyme may catalyse more than one reaction. No monomer may catalyse a reaction. Some polymers do not catalyse reactions.
(6) Reactions of the above type are only studied in the confined volume of a cell or reaction chamber that, in its initial form, we regard as created by the self-association of a random number of each monomer and a random number of a random selection of polymers made outside the cell (by ‘abiotic’ mechanisms that may be different from those in (5) and that we do not study).
(7) Nutrients are then supplied to the cell at regular or intermittent intervals. The cell may also be supplied with polymers, but at a much lower rate than that of the supply of nutrients.
(8) The dynamics of the system is described by representations of its state at discrete time steps. At each time step, a nutrient may or may not be incorporated into the cell, depending on the availability of the nutrients outside the cell. In principle, there could be an efflux of material through the cell, since the possibility also exists of losing monomers and polymers from the cell (for example, the probability of a monomer or polymer being lost could be inversely proportional to the number of reactions in which it is involved). This possibility is not implemented in the version described here.
(9) At each time step, two lists that describe the system are updated. The first list, the MoleculeList, contains a reference to each molecule present in the cell. Each molecule has a description comprising its label (name), the number of copies, whether it is a monomer or a polymer and in the latter case, whether it is an enzyme. The second list, the EnzymeList, contains a reference to each enzyme present in the cell. Each enzyme has a description comprising its label, the number of copies, its activity status (the possibility of using the parameter ‘active or inactive’ is built into the model but has not been used to generate the data shown here) and the reaction(s) it catalyses. A reaction is defined by three molecules, corresponding to two substrates, Pp({k}) and Pq({l}), and one product, Pp+q({k}{l}), and by the kf and kr for this reaction where kf and kr are the equivalent of the rate constants for the forward and reverse reactions, respectively. An enzyme can catalyse more than one reaction. Initially, the number of varieties of enzyme is chosen at random. Then the cell is fed in accord with (7).
(10) At each time step, the system is updated by calculating whether each variety of enzyme catalyses its cognate reactions. Each variety of enzymes is examined. The forward reaction can take place if the enzyme Pr({m}) is present and active and if kf×N({k})×N({l})>kr×N({k}{l})×Nt, where Nt is the total number of molecules in the cell and kf and kr are the rate constants for the particular reaction as catalysed by the enzyme. If the inequality is reversed, the reverse reaction occurs. Nothing happens at equilibrium.
(11) Iteration during the time step. A variety of enzyme is chosen from the EnzymeList either according to its concentration (as presented here) or at random (in this case, the choice is weighted by the number of copies, Nm, of the enzyme of a given variety, Pr({m}), such that the probability of choosing this variety is proportional to Nm/NEt, where NEt is the total number of enzymes).
The total increase and decrease in the number of copies of each molecule involved in a single catalysed reaction during the time step is obtained from:
(1) |
(12) At the start of each time step, the cell is tested to see whether it has grown to a threshold at which cell division could occur. This test is based on the total number of polymers in the cell but alternatives include the number of monomers in the form of polymers as well as the number of copies of a specific polymer. In the present model, all reactions cease at this critical cell size and the program ends.
3 Results
Here, we present a typical run of the program using only three types of monomer. The number of copies of these monomers present in the cell is given in C0, for concentration at origin (column 4, Table 1). The open circle symbol in column 1 (Table 1) indicates which of the polymers in ID, for identity (column 3), were present in the cell at the start of the experiment. C0 also gives the number of copies of polymers at the start – note that this cell did not initially contain polymers of types ID 233 to ID 31321131212313211121. Cf, for final concentration (column 5), gives the number of copies of the polymer at the end of the experiment – note that if a polymer has neither an entry in C0 nor Cf, it is because it was formed during the experiment but disappeared before the end. Ns for node class of substrate (column 8) indicates the number of different reactions in which a molecule was involved as a substrate – in other words, it is the node class of the molecule with respect to the reactions in which it has been consumed; Np for node class of products (column 8) indicates the number of different reactions in which a molecule was involved as a product; Nt for total number (column 7) is the sum of Ns and Np. We can therefore consider each molecule as a node belonging to a network of either substrates or products or both; a molecule that can be made into 3 different molecules (i.e. is a substrate) and that is made from two others (i.e. is a product) has a node class of Ns=3, Np=2 and Nt=5. In this analysis, the cell can be regarded as a network of molecules that are connected by catalysed reactions. Reactions (column 9) is the number of times the reaction occurred in the course of the experiment. The right arrow (column 10) is the number of times the molecule was consumed, the left arrow (column 11) is the number of times the molecule was produced and Cn is the difference between consumption and production.
Summary of the cell. o, present at start; ez, enzyme activity; number of copies at start; Id, identity of molecule; number of copies at end; total connectivity; connectivity as substrate; connectivity as product; Reactions, number of reactions involving molecules (arrows indicate direction of reactions); overall number of reactions
o | ez | Id | Food | Reactions | ||||||||
° | 1 | 53 | 107 | 54 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
° | 2 | 54 | 0 | 54 | 6 | 6 | 0 | 1946 | 919 | 1027 | −108 | |
° | 3 | 97 | 58 | 64 | 3 | 3 | 0 | 1307 | 602 | 705 | −103 | |
204 | 165 | 172 | 10 | 10 | 0 | 3253 | 1521 | 1732 | ||||
° | ∗ | 11 | 14 | 6 | 5 | 5 | 1960 | 976 | 984 | −8 | ||
° | ∗ | 12 | 2 | 4 | 4 | 2 | 2 | −2 | ||||
° | ∗ | 13 | 6 | 3 | 3 | 286 | 140 | 146 | −6 | |||
° | ∗ | 21 | 13 | 7 | 7 | 1481 | 734 | 747 | −13 | |||
° | ∗ | 22 | 3 | 3 | 1 | 1 | ||||||
° | ∗ | 23 | 3 | 100 | 2 | 2 | 1021 | 559 | 462 | 97 | ||
° | ∗ | 31 | 6 | 3 | 2 | 2 | 1639 | 818 | 821 | −3 | ||
° | ∗ | 32 | 15 | 1 | 3 | 3 | 14 | 14 | −14 | |||
° | ∗ | 33 | 6 | 3 | 3 | 786 | 390 | 396 | −6 | |||
∗ | 233 | 5 | 2 | 1 | 1 | 215 | 110 | 105 | 5 | |||
∗ | 312 | 3 | 2 | 1 | 1 | 443 | 223 | 220 | 3 | |||
∗ | 313 | 6 | 5 | 1 | 566 | 283 | 283 | |||||
∗ | 1121 | 7 | 5 | 2 | 1198 | 599 | 599 | |||||
∗ | 1131 | 1 | 1 | 1196 | 598 | 598 | ||||||
∗ | 1212 | 2 | 1 | 1 | ||||||||
3131 | 1 | 1 | ||||||||||
∗ | 3232 | 6 | 3 | 2 | 1 | 10 | 8 | 2 | 6 | |||
∗ | 3321 | 4 | 2 | 2 | 618 | 309 | 309 | |||||
∗ | 21121 | 4 | 3 | 1 | 248 | 124 | 124 | |||||
22233 | 1 | 1 | ||||||||||
∗ | 23321 | 3 | 2 | 1 | 48 | 24 | 24 | |||||
∗ | 31321 | 6 | 5 | 1 | 162 | 81 | 81 | |||||
∗ | 323232 | 1 | 1 | |||||||||
∗ | 1123321 | 1 | 1 | 1 | 1 | 1 | 1 | |||||
∗ | 1221121 | 1 | 1 | |||||||||
∗ | 2112112 | 2 | 3 | 2 | 1 | 24 | 13 | 11 | 2 | |||
∗ | 3131121 | 3 | 2 | 1 | 152 | 76 | 76 | |||||
∗ | 3133232 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | ||||
112131321 | 1 | 1 | ||||||||||
132112112 | 3 | 2 | 1 | |||||||||
∗ | 211211121 | 2 | 1 | 1 | 26 | 13 | 13 | |||||
∗ | 212112112 | 2 | 1 | 1 | 22 | 11 | 11 | |||||
∗ | 313211121 | 5 | 2 | 1 | 1 | 37 | 21 | 16 | 5 | |||
∗ | 332131321 | 1 | 1 | |||||||||
∗ | 11132112112 | 1 | 1 | |||||||||
∗ | 21132112112 | 1 | 1 | |||||||||
∗ | 211211121313 | 1 | 1 | |||||||||
∗ | 212112112312 | 2 | 1 | 1 | ||||||||
∗ | 313213131121 | 1 | 1 | |||||||||
∗ | 1212313211121 | 3 | 2 | 1 | ||||||||
∗ | 131212313211121 | 2 | 1 | 1 | ||||||||
∗ | 23321212112112312 | 1 | 1 | |||||||||
∗ | 12123132111213131121 | 1 | 1 | |||||||||
31321131212313211121 | 1 | 1 | ||||||||||
Total | 68 | 136 | 107 | 68 | 39 | 12158 | 6113 | 6045 | 68 |
The asterisks in ez (column 2, Table 1) indicate which of the polymers were enzymes. Polymers either started as enzymes or acquired activity later. This latter is simply a device to model the entry of a new enzyme by ascribing at random an activity to a polymer present in the original cell (column 1, Table 1); an alternative possibility not explored here is that acquisition of activity is due to a cofactor entering the cell.
The kinetics of the numbers of monomers and polymers (Fig. 1) show that an event occurred at time step 360 when enzyme 11 ‘entered’ the cell. The reaction catalysed by 11 is 2+3↔23 and this led to a rapid increase in the numbers of 23 (Fig. 2). There were no enzymes catalysing the reaction 1+1↔11, so catalytic closure did not occur [1]. Nevertheless, the system grew using its initial complement of polymer 11 until time step 688 when it had doubled the number of its polymers (from 68 to 136). 23 was also an enzyme and catalysed the reaction leading to another enzyme 33+21↔3321; 3321 catalysed 2+33↔233; 233 catalysed 11+31↔1131, a reaction that consumes or generates 11 (Table 2). Hence there is a cycle. (Note that the reactions of all enzymes are easily displayed, although we do not show them here; many other reactions are indicated in Table 1.) It might be expected that enzymes that catalyse the addition of monomers to other molecules would be the most abundant. This is not the case in this example, where 11, present in six copies at the end, is not as abundant as 23, present in a hundred copies at the end. Note that another enzyme in the cycle, 3321, also catalysed addition of the monomer, 2 (Table 2), and that this enzyme was present neither at the start nor the end of the experiment. Such fluctuations in potentially key enzymes were common in the program.
A reaction network: and are the forward and reverse rate constants (other symbols as in Table 1)
Id | Step | Reaction | |||
a | 11 | 360 | 20 | 222 | |
b | 23 | 0 | 74 | 426 | |
c | 3321 | 60 | 43 | 823 | |
d | 233 | 80 | 1 | 102 | |
e | 1131 | 100 | 15 | 168 |
The numbers of members of a particular node class (where the class is the sum of the reactions in which the molecule is either a substrate or product) reveals a distribution with a long tail (Fig. 3). The molecules with the highest connectivities were 21 and 1121 and those with the next highest were 2, 313 and 31321 (Table 1). We have not analysed connectivity in terms of catalysts in a context where an enzyme could catalyse several reactions, although this would be possible. In summary, these results show that even in this simple model, autocatalytic networks form readily. It should be noted that, reassuringly, the system does not grow if it does not use the monomeric food set in its reaction network!
4 Discussion
Artificial chemistry offers a powerful possibility for both testing existing biological concepts and for deriving new ones [2,10]. To contribute to an ambitious, integrative modelling of biological cells, I-cell [11], we have used a variant that we term artificial microbiology to study how autocatalytic networks develop when they are confined to cells that are allowed to grow to a fixed size. The results from a limited range of simulations, with a restricted set of molecules, are compatible with the idea that the most abundant enzymes are those that catalyse the addition of monomers. This appears obvious – as a network evolves, the first limiting step in the growth of the cell is the flow of monomers into the cell and the second limiting step is the addition of these monomers onto polymers or onto other monomers. The enzymes that add monomers may therefore correspond to the ancestral precursors of modern polymerases and ribosomes, which also add monomers onto polymers; such enzymes can constitute a major proportion of cell mass. That said, the proportion of cell mass in the form of ribosomes varies with the nutrient supply in bacteria such as Escherichia coli and it should be noted that the most abundant enzyme in the example given in this paper does not catalyse monomer addition.
Abstract networks or graphs consist of nodes and the connections or links between them. A particular node belongs to a node class that corresponds to the number of connections per node. By allocating a metabolite to a node class on the basis of its connections to other metabolites, we and – independently – others found that real metabolic networks have a power law distribution of node classes characterized by a long tail [7,8,12]. Similar distributions have been found for proteins [13]. Artificial microbiology permits the origin and properties of real metabolic distributions to be investigated. In many individual runs of the system presented here, the distribution of the connectivities did indeed show a long tail. Hence, the system may be useful in answering such questions as: What network structure confers robustness to fluctuations in nutrition? Where should the highest node classes be located so as to assure a maximum use of inputs and a steady output? What allows the network to be regenerated after starvation?
In the version presented here, the program ends when the cell attains a size at which division might occur. The importance of cell division in the evolution of autocatalytic networks has recently been described [3,5]. Our program offers the possibility of exploring how competing autocatalytic networks within the mother cell may be separated by cell division into individual ones. By selecting the faster growing daughter cell, the role of cell division in the evolution of networks can then be studied. This role may be facilitated by the co-localisation of many different macromolecules into a non-equilibrium hyperstructure in response to the cellular need to perform a function [14,15]. Cell division in bacteria, for example, would be performed by a hyperstructure comprising division genes, their mRNA and enzymes, together with particular lipids and ions such as calcium [5]. Future development of the model will therefore entail attributing increased probabilities of reactions to polymers that are colocalised so as to allow evaluation of the consequences of hyperstructure formation. Finally, artificial microbiology may be extended to study what happens when other types of reactions are introduced [16], when the cell shifts from a growing to a non-growing state, when inactive enzymes are preferentially inhibited, and when a coding polymer (DNA/RNA) is present.
Acknowledgements
We thank Dick D'Ari, Michel Thellier, the members of the epigenesis atelier in Genopole and an anonymous referee for helpful comments.