1 Introduction
The cell cycle is normally defined as the period from cell birth to its division into two daughter cells. The most significant aspect of cell growth is the synthesis of new cellular material, such as proteins, and the initiation of DNA synthesis is one of the first observable cell cycle landmarks. There are cells that are characterized by uncontrolled growth. It is of greatest interest to find answer to the question what causes the cell to begin initiation of DNA synthesis, since this problem is closely connected with explaining the cause of pathological growth, such as in cancer.
Many mathematical models were proposed which explain the asymptotic stability of the cell cycle. By the stability concept we mean the stability of the line: mother–daughter–granddaughter–great-grand-daughter, etc. We derive a recursion relation for the distribution of cell mass at birth in a sample of cells in the generation n given the distribution in the generation n−1. We prove the existence of a unique, asymptotically stable, limiting mass distribution. An alternative convention suggested, among others, by Harris [1] and Jagers [2] is to choose samples from all cells present in the culture at a particular instant of time, and investigate stabilization of the mass distribution in the population as time passes.
The cell cycle is one of many examples of systems that have a mixture of deterministic and probabilistic dynamics. Many other examples may be found in [3]. In this paper we develop a general modelling framework for such systems.
First, we shortly review previous cell cycle models and the mathematical tools used to investigate the asymptotic stability of the mass distribution at birth. We concentrate on the models that represent linear dynamical systems with a stochastic perturbation. In Section 3, we present the convolution model for estimating distributions of cell mass at birth. In Section 4, we describe our simulation procedure. In Section 5, we present the maximum likelihood method for estimate parameters in our convolution model and finally show how the theoretical density functions of cell mass birth fit both our simulated and experimental data.
2 Background
The cell mass at birth (x) may be considered as a random variable. If we denote by f the density function of the mass distribution at birth for a mother cell then Pf denotes the density function of the mass distribution at birth for daughter cells. In general the operator P has the form
(1) |
The particular form of the kernel K(x,y) depends on the assumptions concerning the mass growth and the probability of division. Depending on these assumptions we can distinguish, among others, the following models of the cell cycle: the mitogen model [4], the tandem model [5], the probabilistic model [6,7], and the generalized mitogen model [8]. In all these models, the authors studied asymptotic properties of the sequence of iterates {Pn} (Pn denotes the mass density function in the n-th generation).
In the paper by Lasota, Mackey and Tyrcha [9], we developed a general modelling framework for the treatment of the statistical dynamics of systems in which easily identifiable events occur at irregular times. The desired recurrence relation between successive mass levels at event occurrence is given by:
(2) |
- – the functions and are non-decreasing and absolutely continuous on each subinterval [0,c] on the half-line R+;
- – Q(0)=λ(0)=0 and .
In interpreting the cell division cycle within the context of the general model presented in [9], we associate the occurrence of an event with a triggering of the process, which ultimately leads to mitosis and cytokinesis.
The class of models proposed by Lasota and Mackey [4], Tyson and Hannsgen [5] and Tyrcha [8] satisfies all of the conditions of the general model described above and in [9]. Furthermore, the quantities of mass in consecutive generations of newly born cells satisfy the recurrence relation (2) with τn having a survival function e−x. Setting zn=Q[λ(xn)], the transition operator for the evolution of mass density (2) is given by:
(3) |
As another example, extensions proposed by Tyson and Hannsgen [7] and Hannsgen et al. [6] of the well-known cell cycle models of Smith and Martin [10] and Shields [11] also fall within the general framework of this paper. In this situation, we assume that the cell goes through phases A and B, and that the length TB of phase B is constant. The end of the B phase marks cell division. The length TA of phase A is considered to be a random variable with a density function h so as:
(4) |
(5) |
3 Convolution model
We considered before [8,12] the asymptotic stability of the cell mass at birth for both mass-dependent and age-dependent models of the cell cycle. In this paper, we describe a cell mass distribution at birth in nth generation if we assume a cell mass distribution at birth in 0-th generation and distribution of stochastic perturbations (Eqs. (3) and (5)).
Let x0, τ0, τ1, ‘K’ be a sequence of independent stochastic variables where x0 is a Gaussian stochastic variable and the τn are exponentially distributed. Our model for the first cell generation is a truncated distribution of the convolution of the Gaussian and the exponential distributions. Such a model is a result of discussion with a group of biologists lead by Prof. Anders Zetterberg (Karolinska Institute, Stockholm, Sweden). From experimental investigations, it follows the assumption about initial cell mass distribution at birth being Gaussian (see also [13]). Starting from the initial mass, a cell is growing up but, even if it grows up according to a deterministic equation, one can observe some stochastic perturbations. A stochastic mass increment is assumed to be exponentially distributed. We are interested in the truncated distribution, because we would like to eliminate cells with an extremely high mass.
The distribution of the convolution of the Gaussian and the exponential distributions depend on three parameters: the mean value (equal to the standard deviation) 1/a of the exponential, where a is the intensity parameter, and the mean μ and the standard deviation σ of the Gaussian component. Its density can be expressed as:
(6) |
When cell masses above a maximum value T are not taken into consideration, it means that distribution in Eq. (6) is truncated in T. The truncated density function fT(x) looks as follows:
(7) |
If we assume that Eq. (7) holds, then our model for the n-th cell generation is the truncated convolution of n copies of the distribution for one generation. The convolution of fT with itself cannot be explicitly given. Usually calculations for convolutions are made via Fourier transforms and Fourier inversion, since a convolution of distributions corresponds to multiplication of their Fourier transforms. However, the truncated distribution (7) for the n-th cell generation has no explicit Fourier transform.
Instead, we have calculated the distribution for the n-th cell generation by a saddle-point approximation [14–16]. This is a very accurate distribution approximation in many situations. The saddle-point approximation method uses the Laplace transform (the moment generating function), and not the Fourier transform. The Laplace transform ψ(θ) of (6) can be explicitly calculated:
(8) |
(9) |
In many applications, the saddle-point equation μ(θ)=s cannot be solved analytically, although the solution exists. This is also the case here. The saddle-point method can still be applied by solving the equation numerically. Often one uses Newton–Raphson derivative-based methods to calculate the saddle-point. These methods in general will work well, since the function K(θ)−θs to be minimized is convex. The second derivative exists and is always positive. An initial approximation θ0 is chosen and an iterative procedure is then used to calculate the saddle-point.
Let us turn then to the tail probabilities. We consider Prob(Sn⩾s), where s⩾μ(0). Then the solution of equation μ(θ)=s satisfies . For , we can calculate the tail probability (or risk) that the cell mass exceeds any specified level s according to the formula
(10) |
Observe that the Edgeworth approximation methods provide reasonably good approximations in the centre of the density, but not in the tails, where the approximating density can even be negative. The saddle-point method gives good approximations farther out in the tail and quite good approximations to small-tail probabilities.
4 Data simulation
To see how our analytical formula for compares with a birth-mass histogram from a finite cell population, we have simulated an ideal cellular population by Monte Carlo methods. In particular, 550 cells were allowed to grow linearly and divide according to the assumptions in the age-dependent model. We chose the age-dependent model because it was shown [12] that together with a linear growth, this model is a simple linear dynamical system with an additive stochastic perturbation (see formula (5)). So, we assume that at division each mother cell divides equally between two daughter cells. The length TA of phase A is a random variable exponentially distributed (formula (4)). At division, one daughter was removed from the culture, so that there were always 550 cells in each generation. From an initial arbitrary distribution of birth masses of the 550 cells, the culture was simulated for 20 generations. A histogram of 550 cell birth masses is shown in Fig. 1. The analysis presented in this paper assumes that the fundamentally discrete histogram of cell birth masses in a finite population can be approximated by a continuous probability density function .
5 Parameter estimation
5.1 Maximum likelihood method
We here discuss the estimation of model parameters from a sample of cell masses, specifically the sample illustrated in Fig. 1. We first assume that the model is given by (6), that is a three-parametric convolution of an exponential distribution with a Gaussian one. In the next section we allow occasional high masses, occurring with a probability δ, which is then an additional parameter to be estimated.
Let the sample be x1, x2, ‘Λ’, xN. The likelihood function is:
(11) |
These equations, which can be simplified, are expressed above in the form E(·|x1,…,xn)=E(·), for a representation of the minimal sufficient statistic of the ‘complete’ exponential family we would have had if we could have observed the exponential and the Gaussian separately. It is impossible to solve these equations explicitly, so some iterative method is required. One possibility is to use the EM algorithm [18,19], which will iterate between the left- and right-hand sides of (12), but it will be quite slow. Therefore, we instead eliminated μ from the first equation and obtained the following profile-likelihood function for (a,σ):
We calculated the above log-likelihood function over a grid of different values of the parameters a and σ. The maximum point was found by making the grid successively finer. Based on the complete cell mass data shown in Fig. 1, the parameters were estimated to be: a=11.6, μ=0.6656, σ=0.0079. This estimated distribution did not fit the data well (solid line in Fig. 2).
5.2 Modified maximum-likelihood method
The following modified ML iterative procedure was found to be practical and adequate for dealing with the high cell masses in the large sample of data that was available (N=550). As the first step, model (6) is fitted by ML to all of data, as described above. In the fitted model, the (high) value t is determined such that only one observation should have been expected above t, according to (6). In our case, an explicit and close approximation to t is given by:
This procedure is repeated until convergence.
With the actual data shown in Fig. 1, the parameters were estimated to be: a=15.0, μ=0.6673, σ=0.0089. Fig. 2 shows the fit of this distribution to the data truncated at 1.05.
6 Generation times
In the previous sections, we considered the cell mass at division for models of the cell cycle. Knowing the distribution of birth masses, we can also determine the distribution of generation times, i.e. the time from cell birth to division. In the particular case when is a time-independent stationary sequence, the distribution of generation time has the same property.
We tried to verify our convolution model using experimental data from mouse L. We received the data from Prof. Michael C. Mackey (McGill University, Montreal, Canada). The data contain 89 pairs of generation times for respective daughter (sister) cells. Using the maximum-likelihood technique described in Section 5.1, we estimate parameters in the convolution model to be: a=3.2, μ=0.5178, σ=0.02. The theoretical density function fits the experimental data quite well (see Fig. 3). The same experimental data (mouse L) was used before to check the sister cell model described in [12]. The fitting of the data by the density function in the convolution model is much better than it was in the sister cell model (see Fig. 4 in [12]).
7 Conclusion
We have estimated the cell cycle models that have a mixture of deterministic and probabilistic dynamics. We proposed to estimate a distribution of cell birth masses with truncated convolution of Gaussian and exponential distributions. It was shown that our convolution model fits well both the Monte Carlo simulated data and experimental data (mouse L). It is very important to be able to calculate the probability that a cell mass exceeds some level. The ability to control a pathological growth has a big significance in the biology. The saddle-point approximation method presented here is very useful in calculating such tail probabilities. This method is good not only in our convolution model, but in many other possible models, since it gives good approximations to small tail probabilities.