## 1 Introduction

We start by explaining some of the basic concepts of evolutionary game theory, introduced in the classic paper [1] (see also [2]), which will be of use in the rest of the paper. In particular we discuss matrix games between symmetric players and their equivalent between asymmetric players, bimatrix games. We also use the concept of the replicator dynamic to consider the evolution of strategies as a function of time.

### 1.1 Matrix games

The following idea is useful for modelling a population of animals which compete in pairwise conflicts for some resource, which could be food or mates, for example. It is assumed that all members of the population are indistinguishable and each individual is equally likely to face each other individual. There are a finite number of pure strategies available to the players to play in a particular game. Let **U** be the set of pure strategies so that $\mathbf{U}=\{1,\dots ,n\}$. Given the strategies played the outcome is determined; if player 1 plays i against player 2 playing j then player 1 receives reward ${a}_{ij}$ (player 2 receives ${a}_{ji}$) representing an adjustment in Darwinian fitness. The value ${a}_{ij}$ can be thought of as an element in the $n\times n$ matrix A, the payoff matrix.

An animal need not play the same pure strategy every time, it can play a mixed strategy, i.e., play i with probability ${p}_{i}$ for each of $i=1,\dots ,n$. Thus the strategy played by an animal is represented by a probability vector **p**. The expected payoff to player 1 playing **p** against player 2 playing **q**, which is written as $E[\mathbf{p},\mathbf{q}]$, is

$$E[\mathbf{p},\mathbf{q}]=\sum {a}_{ij}{p}_{i}{q}_{j}={\mathbf{p}}^{\mathrm{T}}\mathbf{Aq}$$ |

**p**is a Nash equilibrium if ${\mathbf{q}}^{\mathrm{T}}\mathbf{Ap}\u2a7d{\mathbf{p}}^{\mathrm{T}}\mathbf{Ap}$ for all alternative strategies

**q**(so a strategy is a Nash equilibrium if it is a best reply to itself).

The support of **p** is defined as $S(\mathbf{p})=\{i:\phantom{\rule{0.25em}{0ex}}{p}_{i}>0\}$.

**p** is an internal strategy if $S(\mathbf{p})=U$.

**p** is thus a Nash equilibrium if ${(Ap)}_{i}=\lambda $, $i\in S(\mathbf{p})$, ${(Ap)}_{i}\u2a7d\lambda $, $i\notin S(\mathbf{p})$ for some constant λ and is thus an internal Nash equilibrium if $S(\mathbf{p})=U$ and ${(Ap)}_{i}=\lambda $ ∀i.

A strategy **p** is an Evolutionarily Stable Strategy (ESS) if

- (i) ${\mathbf{q}}^{\mathrm{T}}\mathbf{Ap}\u2a7d{\mathbf{p}}^{\mathrm{T}}\mathbf{Ap}$ and
- (ii) if ${\mathbf{q}}^{\mathrm{T}}\mathbf{Ap}={\mathbf{p}}^{\mathrm{T}}\mathbf{Ap}$ then ${\mathbf{q}}^{\mathrm{T}}\mathbf{Aq}<{\mathbf{p}}^{\mathrm{T}}\mathbf{Aq}$

**q**.

A matrix may possess a unique ESS, no ESSs or many ESSs. See [3,4] for a discussion of the possible complexity of the ESS structure of a matrix.

### 1.2 Bimatrix games

The assumptions underlying the bimatrix game model are the same as for the matrix game model, except that pairwise contests are fought between individuals in asymmetric positions, so that the individual designated player 1 has a different set of pure strategies ${\mathbf{U}}_{1}$ to the set available to player 2 (${\mathbf{U}}_{2}$). If player 1 plays its strategy i against player 2 playing its strategy j, then player 1 receives reward ${a}_{ij}$ and player 2 receives reward ${b}_{ji}$. The payoffs combine to form the payoff matrices A and B. In the same way individuals can play mixed strategies, so that if player 1 plays **p** and player 2 plays **q**, the rewards to the players are ${\mathbf{p}}^{\mathrm{T}}A\mathbf{q}$ and ${\mathbf{q}}^{\mathrm{T}}B\mathbf{p}$, respectively.

The two strategy pair ${\mathbf{p}}_{1},{\mathbf{p}}_{2}$ is a Nash equilibrium pair, if

$${\mathbf{p}}_{1}^{\mathrm{T}}A{\mathbf{p}}_{2}\u2a7e{\mathbf{q}}_{1}^{\mathrm{T}}A{\mathbf{p}}_{2}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\mathbf{p}}_{2}^{\mathrm{T}}B{\mathbf{p}}_{1}\u2a7e{\mathbf{q}}_{2}^{\mathrm{T}}B{\mathbf{p}}_{1}$$ |

The strategy pair ${\mathbf{p}}_{1}$, ${\mathbf{p}}_{2}$ is an ESS if it is a Nash equilibrium pair,

$$\text{and whenever}\phantom{\rule{0.25em}{0ex}}{\mathbf{p}}_{1}^{\mathrm{T}}A{\mathbf{p}}_{2}={\mathbf{q}}_{1}^{\mathrm{T}}A{\mathbf{p}}_{2}\phantom{\rule{0.25em}{0ex}}\text{then}\phantom{\rule{0.25em}{0ex}}{\mathbf{p}}_{1}^{\mathrm{T}}A{\mathbf{q}}_{2}>{\mathbf{q}}_{1}^{\mathrm{T}}A{\mathbf{q}}_{2}$$ |

$$\text{and whenever}\phantom{\rule{0.25em}{0ex}}{\mathbf{p}}_{2}^{\mathrm{T}}B{\mathbf{p}}_{1}={\mathbf{q}}_{2}^{\mathrm{T}}B{\mathbf{p}}_{1}\phantom{\rule{0.25em}{0ex}}\text{then}\phantom{\rule{0.25em}{0ex}}{\mathbf{p}}_{2}^{\mathrm{T}}B{\mathbf{q}}_{1}>{\mathbf{q}}_{2}^{\mathrm{T}}B{\mathbf{q}}_{1}$$ |

### 1.3 Replicator dynamics

Let us assume that individuals can only play pure strategies, and let the proportion of players of pure strategy i at a particular time be ${p}_{i}$ $(i=1,\dots ,n)$, so that the average population strategy (the population state) is the vector $\mathbf{p}=({p}_{i})$, with the expected payoff (in terms of Darwinian fitness) of an i-player in such a mixture being ${(A\mathbf{p})}_{i}$ and the overall expected payoff in the population being $\sum {p}_{i}{(A\mathbf{p})}_{i}={\mathbf{p}}^{\mathrm{T}}A\mathbf{p}$. Then the standard replicator dynamic (continuous) for the matrix game with payoff matrix A is defined by the differential equation

$$\frac{\mathrm{d}{p}_{i}}{\mathrm{d}t}={p}_{i}[{(A\mathbf{p})}_{i}-{\mathbf{p}}^{\mathrm{T}}A\mathbf{p}]$$ |

**p**, is locally stable if it is an ESS (this is not necessarily true for the discrete dynamic, [6]).

**p**is called an Evolutionarily Stable State rather than Strategy, since no individual actually plays

**p**, it is rather the average of those strategies played. The replicator equation has been applied in very many situations (see [7,8]).

Assuming evolution under the replicator dynamic (or any other dynamic), the time average of the population state **v** (where ${\mathbf{v}}^{\mathrm{T}}=({v}_{1},\dots ,{v}_{n})$) at time T is the vector $\mathit{TA}(\mathbf{v})$, whose ith element $\mathit{TA}{(\mathbf{v})}_{i}$ is defined by

$$\mathit{TA}{(\mathbf{v})}_{i}=\frac{1}{T}\underset{0}{\overset{T}{\int}}{v}_{i}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t$$ |

Note that:

- (1) adding a constant to any column of the payoff matrix makes no difference to the Nash equilibria, or the trajectory of the path taken by the population under the replicator dynamic (including time factors), so making no difference to the game at all.
- (2) If the entire matrix is multiplied by a positive constant, then the Nash equilibria remain constant and the trajectory of the replicator dynamic is unaltered, but speed along the trajectory may be changed, and thus the time average of the population state may also be affected.

### 1.4 Nonconstant payoffs

The majority of game models deal with a fixed payoff structure, so that if the strategies adopted by all the combatants are known, then the payoffs are given. This corresponds with the payoff matrix of constants, A. In the real world, however, the time at which contests occur can be crucial. As the breeding season develops there is a natural variation in the rewards available for any given contest [9,10]. But due to environmental changes, variations may occur from year to year, and even day to day due to unpredictable effects such as the weather [11].

We consider a matrix game where the payoffs vary with time. In particular, if player 1 plays i against player 2 playing j at time t, then player 1 receives payoff ${a}_{ij}(t)$. Clearly a lot of the concepts from the static games must now be reconsidered; for instance, **p** may be a Nash equilibrium of a particular constant payoff matrix (a snapshot in time) but what happens in the long term as the matrix changes? We consider how the population state changes under the replicator dynamic and ask what rules we can establish.

In general we assume that the entries in the payoff matrix are bounded, so that ${a}_{1}<\mathrm{max}|{a}_{ij}(t)|<{a}_{2}$ for all $i,j,t$, $0<{a}_{1}<{a}_{2}$. It is also usually assumed that A is continuous and that indeed it cannot change too quickly, so that

$$|\frac{\mathrm{d}}{\mathrm{d}t}{a}_{ij}(t)|<{a}_{3}\phantom{\rule{1em}{0ex}}\forall i,j,t$$ |

$$L=({l}_{ij})\in \Lambda \phantom{\rule{1em}{0ex}}\text{if and only if}{l}_{ij}={l}_{j}\phantom{\rule{1em}{0ex}}\text{for all}\phantom{\rule{0.25em}{0ex}}i,j$$ |

Finally, we assume that ${a}_{4}<\mathrm{max}|{a}_{ij}(t)-{a}_{kj}(t)|$ for some $i,j,k$ and ${a}_{4}>0$, and for all t. This ensures that the game does not become completely degenerate by bounding it away from the set Λ.

## 2 Results

Firstly we introduce some general results, and their consequences, before moving on to consider a particular application of the theory.

### 2.1 General results

In this section the variable payoff matrix with elements ${a}_{ij}(t)$ is simply written as A and the population state is the vector **v**.

##### Theorem 1

A converges does not imply that $\mathit{TA}(\mathbf{v})$ converges.

This results follows simply from the fact that even for constant A, not only does **v** not always converge, but neither does $\mathit{TA}(\mathbf{v})$ (see [8]). Complex population dynamics can thus occur even with constant payoffs. What can we say about the case where payoffs vary?

##### Theorem 2

If $\mathbf{v}\to {\mathbf{v}}^{\ast}$ , where $\mathbf{v}(0)$ has no zero elements, then $A\to g(t){A}^{\ast}+\Lambda (t)$ for some continuous function $g(t)$ and ${A}^{\ast}$ , a matrix of constants, such that ${\mathbf{v}}^{\ast}$ is a Nash equilibrium of ${A}^{\ast}$ .

So if the prevalent population state converges, then it must be the case that the payoffs converge either to a constant payoff matrix, or one which varies in time in such a way that the ratio

$$\frac{{a}_{i1,j1}(t)-{a}_{i2,j2}(t)}{{a}_{i3,j3}(t)-{a}_{i4,j4}(t)}$$ |

##### Theorem 3

If

- (a) ${a}_{ij}(t)={\alpha}_{ij}+{f}_{ij}(t)$ and $1/T{\int}_{0}^{T}|{f}_{ij}(t)-{f}_{kj}(t)|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$ $\forall i,j,k$ and
- (b) any element ${v}_{i}(t)$ which approaches the boundary does so sufficiently slowly so that $1/T\mathrm{log}({v}_{i}(T))\to 0$ ,

If the payoffs vary with time, but to a limited extent (in particular the environmental impact is such that there is not regular displacement from the mean, except possibly of vanishingly small size), and that this is not strong enough to eliminate strategies at a sufficiently fast (exponential) rate then convergence of the time average of the population state occurs. We do not have a stronger result for a general constant payoff matrix; thus the assumption of constant payoffs can be relaxed a little without affecting the dominant long term behaviour.

##### Proposition 1

If$\mathit{TA}(A)\to \aleph $and if${\mathbf{v}}^{\ast}$is an internal Nash equilibrium of ℵ, then the distance of${\mathbf{v}}^{\ast}$from$\mathit{TA}(\mathbf{v})$, the length of the vector${\mathbf{v}}^{\ast}-\mathit{TA}(\mathbf{v})$, can converge to a value which is arbitrarily close to 1.

This result demonstrates that considering the long-term mean payoffs is insufficient to find the long-term average state; indeed it is possible to find situations when considering the long-term mean payoffs would give you the worst possible estimate of the mean state.

##### Proposition 2

$\mathit{TA}(\mathbf{v})$ converges does not imply that we can find bounded function $0<{k}_{1}<g(t)<{k}_{2}$ and matrix $\Lambda (t)$ , defined as above, such that $\mathit{TA}({A}^{\ast})$ converges, where ${A}^{\ast}=(A-\Lambda (t))/g(t)$ .

(Note that we need this more elaborate statement, rather than just nonconvergence of $\mathit{TA}(A)$, as this can be given by the trivial case of $A=\Lambda (t)+K$ for constant matrix K, and suitable $\Lambda (t)$).

Even if the mean population state converges this does not imply convergence of the underlying payoffs (or the relative size of payoffs against the same pure strategy), and even time average convergence is not guaranteed. This is especially surprising given that convergence of the time average of **v** is not guaranteed even for constant payoffs. Thus observation of this behaviour, without convergence of **v**, could be the result of a wide range of underlying phenomena.

### 2.2 The two stage game

One application of the idea of variable payoffs is in the context of multiple stage games, even when the real natural payoffs are constant. Suppose that a population of animals compete in pairwise contests, where an animal's strategy is decided by a pair of ‘choices’. The first choice that an animal makes then decides the particular type of contest that the two animals are involved in, after which the individuals both pick a second choice, which decides the payoffs that each receive (for other examples of such contests see [12,13]). The two players make their choices at each stage simultaneously, and both know the result of the first stage before playing the second. For instance, the first ‘choice’ could be horn size, which is apparent to both players before any contest ensues. The second stage is thus a subgame within this contest; the available choices of which may be conditional upon the horn size of the participants (thus there is a payoff matrix for each pair of horn sizes).

Note that when the first stage choices are different, then this induces an asymmetry into the contest. We assume that if player 1 picks i and player 2 picks j then the reward to player 1 is decided by the payoff matrix ${B}_{ij}$. If $i=j$ then we have the standard matrix game with payoff matrix ${B}_{ii}$, otherwise we have a bimatrix game with payoff matrices ${B}_{ij},{B}_{ji}$, respectively.

The payoff ${a}_{ij}$ is thus the expected reward for first stage strategy i against first stage strategy j, which depends in turn upon the strategies prevalent in such contests, i.e., ${a}_{ij}={\mathbf{p}}_{ij}^{\mathrm{T}}{B}_{ij}{\mathbf{p}}_{ji}$ where ${\mathbf{p}}_{ij}={({p}_{ij,k})}_{k}$ is the population state of i-players when facing j-players for the second stage of the game (and so ${\mathbf{p}}_{ji}$ is the population state of their potential opponents in this contest). When considering the strategy of i against that of j, $i\ne j$, ${\mathbf{p}}_{ij}$ takes the place of ${\mathbf{p}}_{1}$ and ${\mathbf{p}}_{ji}$ takes the place of ${\mathbf{p}}_{2}$ in the bimatrix games earlier described (for individuals of the same first stage strategy ${\mathbf{p}}_{ii}$ takes the place of **p** in a matrix game). It is assumed that the strategies involved in this subgame evolve in the same manner as the first stage choices, and so ${\mathbf{p}}_{ij}$ varies according to the replicator equation

$$\frac{\mathrm{d}}{\mathrm{d}t}{p}_{ij,k}={p}_{ij,k}({({B}_{ij}{\mathbf{p}}_{ji})}_{k}-{\mathbf{p}}_{ij}^{\mathrm{T}}{B}_{ij}{\mathbf{p}}_{ji})$$ |

##### Theorem 4

If${\mathbf{p}}_{ij}\to {\mathbf{p}}_{ij}^{\ast}$$\forall i,j$ (where ${\mathbf{p}}_{ii}^{\ast}$ is an ESS of ${B}_{ii}$ and ${\mathbf{p}}_{ij}^{\ast},{\mathbf{p}}_{ji}^{\ast}$ is an ESS pair of ${B}_{ij},{B}_{ji}$ for $i\ne j$), then ${a}_{ij}(t)={\mathbf{p}}_{ij}^{\mathrm{T}}{B}_{ij}{\mathbf{p}}_{ji}$ satisfies condition (a) for Theorem 3 with ${\alpha}_{ij}={\mathbf{p}}_{ij}^{\ast \mathrm{T}}{B}_{ij}{\mathbf{p}}_{ji}^{\ast}$.

##### Corollary

Suppose that

- (i) ${\mathbf{p}}_{ii}^{\ast}$ is an ESS of ${B}_{ii}$ and ${\mathbf{p}}_{ij}^{\ast},{\mathbf{p}}_{ji}^{\ast}$ is an ESS pair of ${B}_{ij},{B}_{ji}$ for $i\ne j$ , $\forall i,j$ .
- (ii) ${\mathbf{p}}_{ij}\to {\mathbf{p}}_{ij}^{\ast}$ $\forall i,j$ .
- (iii) ${a}_{ij}(t)={\mathbf{p}}_{ij}^{\mathrm{T}}{B}_{ij}{\mathbf{p}}_{ij}$, with none of the first stage strategies approaching 0 at an exponential rate (except any not featuring in the support of ${\mathbf{v}}^{\ast}$ below) so that condition (b) of Theorem 3 is satisfied.

Suppose that in a two stage game there are two strategies in the first stage, and then two strategies in each of the four possible second stage situations, defined by the payoff matrices

$${B}_{ij}=\left|\begin{array}{cc}\hfill {b}_{ij}(11)\hfill & \hfill {b}_{ij}(12)\hfill \\ \hfill {b}_{ij}(21)\hfill & \hfill {b}_{ij}(22)\hfill \end{array}\right|$$ |

**Table 1**

ESSs and payoffs for ${B}_{ii}$

Condition | ${b}_{ii}(11)>{b}_{ii}(21)$ | ${b}_{ii}(11)<{b}_{ii}(21)$ | ${b}_{ii}(11)>{b}_{ii}(21)$ | ${b}_{ii}(11)<{b}_{ii}(21)$ |

${b}_{ii}(12)>{b}_{ii}(22)$ | ${b}_{ii}(12)<{b}_{ii}(22)$ | ${b}_{ii}(12)<{b}_{ii}(22)$ | ${b}_{ii}(12)>{b}_{ii}(22)$ | |

ESS(s) | (1) | (2) | (1), (2) | p |

payoff | ${b}_{ii}(11)$ | ${b}_{ii}(22)$ | ${b}_{ii}(11)$ or ${b}_{ii}(22)$ | ∗ |

**Table 2**

ESSs for ${B}_{12}$ and ${B}_{21}$

Conditions | ${b}_{21}(11)>{b}_{21}(21)$ | ${b}_{21}(11)>{b}_{21}(21)$ | ${b}_{21}(11)<{b}_{21}(21)$ | ${b}_{21}(11)<{b}_{21}(21)$ | |

Conditions | ${b}_{21}(12)>{b}_{21}(22)$ | ${b}_{21}(12)<{b}_{21}(22)$ | ${b}_{21}(12)<{b}_{21}(22)$ | ${b}_{21}(12)>{b}_{21}(22)$ | |

${b}_{12}(11)>{b}_{21}(21)$ | ${b}_{12}(12)>{b}_{21}(22)$ | (1,1) | (1,1) | (1,2) | (1,2) |

${b}_{12}(11)>{b}_{21}(21)$ | ${b}_{12}(12)<{b}_{21}(22)$ | (1,1) | (1,1) or (2,2) | (p,q) | (2,2) |

${b}_{12}(11)<{b}_{21}(21)$ | ${b}_{12}(12)>{b}_{21}(22)$ | (2,1) | (p,q) | (1,2) or (2,1) | (1,2) |

${b}_{12}(11)<{b}_{21}(21)$ | ${b}_{12}(12)<{b}_{21}(22)$ | (2,1) | (2,2) | (2,1) | (2,2) |

When one player plays 1 and the other 2, we have the bimatrix game defined by ${B}_{12},{B}_{21}$. Table 2 gives the strategies in the ESSs.

When there are no pure ESSs and a single Nash equilibrium pair for the bimatrix game, it is shown in [14] that the pair $(p,q)$ is a centre and that the time averages of $p(t)$ and $q(t)$ are p and q, respectively. The time averages of the payoffs are also shown to be ${\mathbf{p}}^{\mathrm{T}}{B}_{12}\mathbf{q}$ and ${\mathbf{q}}^{\mathrm{T}}{B}_{21}\mathbf{p}$, respectively. Note that in the cases where there are two pure ESS pairs, there is an internal equilibrium which is a saddle point, and all trajectories converge to one or other of the ESS pairs.

#### 2.2.1 A numerical example

Suppose that the payoff values from the above example are

$${b}_{11}(12)={b}_{11}(21)={b}_{12}(12)={b}_{12}(21)={b}_{21}(12)={b}_{21}(21)={b}_{22}(12)={b}_{22}(21)=1$$ |

$${b}_{11}(11)=2,\phantom{\rule{1em}{0ex}}{b}_{11}(22)=3$$ |

$${b}_{12}(11)={b}_{12}(22)=4$$ |

$${b}_{21}(11)=5,\phantom{\rule{1em}{0ex}}{b}_{21}(22)=10$$ |

$${b}_{22}(11)=2,\phantom{\rule{1em}{0ex}}{b}_{22}(22)=3$$ |

$$\left|\begin{array}{cc}\hfill 2\hfill & \hfill 4\hfill \\ \hfill 5\hfill & \hfill 2\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(2/5),\phantom{\rule{2em}{0ex}}\left|\begin{array}{cc}\hfill 2\hfill & \hfill 4\hfill \\ \hfill 10\hfill & \hfill 2\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(1/5)$$ |

$$\left|\begin{array}{cc}\hfill 2\hfill & \hfill 4\hfill \\ \hfill 5\hfill & \hfill 3\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(1/4),\phantom{\rule{2em}{0ex}}\left|\begin{array}{cc}\hfill 2\hfill & \hfill 4\hfill \\ \hfill 10\hfill & \hfill 3\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(1/9)$$ |

$$\left|\begin{array}{cc}\hfill 3\hfill & \hfill 4\hfill \\ \hfill 5\hfill & \hfill 2\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(1/2),\phantom{\rule{2em}{0ex}}\left|\begin{array}{cc}\hfill 3\hfill & \hfill 4\hfill \\ \hfill 10\hfill & \hfill 2\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(2/9)$$ |

$$\left|\begin{array}{cc}\hfill 3\hfill & \hfill 4\hfill \\ \hfill 5\hfill & \hfill 3\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(1/3),\phantom{\rule{2em}{0ex}}\left|\begin{array}{cc}\hfill 3\hfill & \hfill 4\hfill \\ \hfill 10\hfill & \hfill 3\hfill \end{array}\right|\phantom{\rule{1em}{0ex}}(1/8)$$ |

## 3 Discussion

Matrix games have been used to model a variety of animal behaviours. They are structurally simple and relatively straightforward and thus amenable to analysis, but at the same time provide plausible, if simplistic, explanations for certain behaviours. However, the assumption of constant payoffs independent of time, is not always realistic. Payoffs change throughout the breeding season, and may also vary markedly from year to year. These variations can be critical to any analysis.

In this paper we have introduced the variable payoff matrix $A(t)$ to consider how different time-dependent payoffs may affect strategies. Naturally the evolution of strategies can be more complex than under constant payoffs. In particular just taking a simple time average of the rewards without recourse to the particular time that they are available can lead not just to the wrong result, but completely the opposite result to that obtained by considering the behaviour at each time separately. Similarly if regular behaviour, with a convergent time average, occurs, this does not guarantee that the underlying payoffs have a convergent time-average (or even that the relative sizes of payoffs against the same pure strategy do), so that simple observed behaviour may be concealing very complex variations in the underlying payoffs.

On the other hand, it is demonstrated that provided the variability of the payoffs is severely restricted, then the existence of a convergent time average of the population state occurs as long as there is a level of persistence of the strategies involved in the time average (they do not tend to 0 exponentially). If the actual state converges with time, then the underlying payoffs must converge in a given manner. Thus we extend our knowledge of how certain populations must behave, as well as demonstrating the complexity of the situation by showing that certain plausible statements are false.

The idea of variable payoffs is finally applied to a particular example, that of a two-stage game, where the basic payoffs are actually constant, but the rewards for the first stage (and thus the whole game) depend upon the strategies used, and so are variable. It is shown that if the population states converge in all of the second stage games then the long term time average of the state of the whole population converges provided that the first stage strategies do not tend to 0 at exponential rate (indeed convergence of the state occurs in a wide variety of cases). A numerical example with two strategies at each stage (and so four different plays in total) is given where each subgame has two ESSs (or ESS pairs), and the population converges to one of eight possible final solutions, depending upon initial conditions. This number of possible solutions is larger than for any matrix game with four strategies, the extra complication resulting from each animal being able to use information about the opponent's strategy to choose its strategy.

## Appendix A

#### Proof of Theorem 2

$$\frac{\mathrm{d}{v}_{i}}{\mathrm{d}t}={v}_{i}({(Av)}_{i}-{v}^{\mathrm{T}}Av)$$ |

$$\sum _{k}({a}_{ik}-{a}_{jk}){v}_{k}\to 0\phantom{\rule{1em}{0ex}}\Rightarrow \phantom{\rule{1em}{0ex}}\sum _{k}({a}_{ik}-{a}_{jk}){v}_{k}^{\ast}\to 0$$ |

$${(Av)}_{k}-{v}^{\mathrm{T}}Av\to {(Av)}_{k}-{(Av)}_{j}\to {(A{v}^{\ast})}_{k}-{(A{v}^{\ast})}_{j},\phantom{\rule{1em}{0ex}}j\in S({v}^{\ast})$$ |

Thus the matrix A becomes arbitrarily close to a matrix which has ${\mathbf{v}}^{\ast}$ as a Nash equilibrium (we represent the collection of all such matrices as $A({v}^{\ast})$). There are different such matrices which satisfy this, so that which of these A approaches may change with time. The $A({v}^{\ast})$ are split into different families so that if ${\mathbf{v}}^{\ast}$ is a Nash equilibrium of ${A}^{\ast}$ it is also a Nash equilibrium of $\mu {A}^{\ast}+L$ (where $L\in \Lambda $ and μ is a nonnegative constant). These families only meet at the set Λ, so that given $\mathrm{max}|{a}_{ij}(t)-{a}_{kj}(t)|>{a}_{4}$ there is a given minimum distance between members of any two families. Since $\mathrm{d}A/\mathrm{d}t$ is bounded, then A cannot move between families without moving the population state **v** away from ${\mathbf{v}}^{\ast}$. Thus A must converge to one such family, i.e.,

$$A\to g(t){A}^{\ast}+\Lambda (t)$$ |

#### Proof of Theorem 3

$$\frac{\mathrm{d}{v}_{i}}{\mathrm{d}t}={v}_{i}({(Av)}_{i}-{v}^{\mathrm{T}}Av)\Rightarrow \mathrm{log}({v}_{i}(T))-\mathrm{log}({v}_{i}(0))=\underset{0}{\overset{T}{\int}}{(Av)}_{i}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t-\underset{0}{\overset{T}{\int}}{v}^{T}Av\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\Rightarrow \mathrm{log}({v}_{i}(T))-\mathrm{log}({v}_{i}(0))-\mathrm{log}({v}_{j}(T))+\mathrm{log}({v}_{j}(0))=\underset{0}{\overset{T}{\int}}({(Av)}_{i}-{(Av)}_{j})\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\Rightarrow \frac{1}{T}\underset{0}{\overset{T}{\int}}({(Av)}_{i}-{(Av)}_{j})\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$$ |

$$\frac{1}{T}\underset{0}{\overset{T}{\int}}({(Av)}_{i}-{(Av)}_{j})\phantom{\rule{0.2em}{0ex}}\mathrm{d}t=\frac{1}{T}\underset{0}{\overset{T}{\int}}(\sum _{k}{a}_{ik}{v}_{k}-\sum _{k}{a}_{jk}{v}_{k})\phantom{\rule{0.2em}{0ex}}\mathrm{d}t=\frac{1}{T}\underset{0}{\overset{T}{\int}}\sum _{k}({a}_{ik}-{a}_{jk}){v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t=\frac{1}{T}\underset{0}{\overset{T}{\int}}\sum _{k}({\alpha}_{ik}-{\alpha}_{jk}){v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t+\frac{1}{T}\underset{0}{\overset{T}{\int}}\sum _{k}({f}_{ik}-{f}_{jk}){v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t=\sum _{k}({\alpha}_{ik}-{\alpha}_{jk})\frac{1}{T}\underset{0}{\overset{T}{\int}}{v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t+\frac{1}{T}\underset{0}{\overset{T}{\int}}\sum _{k}({f}_{ik}-{f}_{jk}){v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t$$ |

$$\sum _{k}({\alpha}_{ik}-{\alpha}_{jk})\frac{1}{T}\underset{0}{\overset{T}{\int}}{v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$$ |

$$\frac{1}{T}\underset{0}{\overset{T}{\int}}\sum _{k}({f}_{ik}-{f}_{jk}){v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$$ |

$$|\frac{1}{T}\underset{0}{\overset{T}{\int}}\sum _{k}({f}_{ik}-{f}_{jk}){v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t|\u2a7d\sum _{k}\frac{1}{T}\underset{0}{\overset{T}{\int}}|({f}_{ik}-{f}_{jk})|{v}_{k}\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\u2a7d\sum _{k}\frac{1}{T}\underset{0}{\overset{T}{\int}}|({f}_{ik}-{f}_{jk})|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t$$ |

$$\frac{1}{T}\underset{0}{\overset{T}{\int}}|({f}_{ik}-{f}_{jk})|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$$ |

**v**converges as above. □

#### Proof of Proposition 1

Consider the payoff matrix

$$\left|\begin{array}{cc}\hfill 1\hfill & \hfill 2\hfill \\ \hfill 2\hfill & \hfill r(t)\hfill \end{array}\right|$$ |

The mean of $r(t)$ is $2-\lambda \delta +\lambda k-k$. The time average of the matrix thus gives the equilibrium value of the proportion playing strategy 1 as

$${v}_{1}=\frac{k-k\lambda +\lambda \delta}{k+1-\lambda k+\lambda \delta}$$ |

$${\overline{v}}_{1}=\lambda \frac{\delta}{1+\delta}+(1-\lambda )\frac{k}{1+k}=\frac{k-k\lambda +k\delta +\lambda \delta}{(k+1)(1+\delta )}$$ |

$${v}_{1}=\frac{\sqrt{k}+\delta (1-1/\sqrt{k})}{1+\sqrt{k}+\delta (1-1/\sqrt{k})}\to \frac{\sqrt{k}}{1+\sqrt{k}}(\delta \to 0)\to 1(k\to \infty )$$ |

$${\overline{v}}_{1}=\frac{\sqrt{k}+\delta (k+1-1/\sqrt{k})}{(k+1)(\delta +1)}\to \frac{\sqrt{k}}{1+k}(\delta \to 0)\to 0(k\to \infty )\phantom{\rule{1em}{0ex}}\square $$ |

#### Proof of Proposition 2

Suppose that the payoff matrix alternates between

$$\left|\begin{array}{ccc}\hfill 2\hfill & \hfill 4\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 2\hfill & \hfill 4\hfill \\ \hfill 4\hfill & \hfill 1\hfill & \hfill 2\hfill \end{array}\right|$$ |

$$\left|\begin{array}{ccc}\hfill 2\hfill & \hfill 1\hfill & \hfill 4\hfill \\ \hfill 4\hfill & \hfill 2\hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 4\hfill & \hfill 2\hfill \end{array}\right|$$ |

**v**and

**v**itself both converge to ${\mathbf{v}}^{\ast}$ (this is clearly still true if we add any element of Λ or multiply by any function $g(t)$ as defined in the proposition). Note that rounding the step function makes no difference to the convergence of $\mathit{TA}(\mathbf{v})$. □

#### Proof of Theorem 4

There are two cases to consider, if $i=j$ and if $i\ne j$.

(a) $i\ne j$. The only possible solution here is a pure pair, ${k}_{1},{k}_{2}$ say. Supposing that the payoff matrices to the two players are B and C, respectively, then for this pair to be an ESS we require

$${b}_{k1k2}>{b}_{ik2},\phantom{\rule{1em}{0ex}}i\ne k1,\phantom{\rule{2em}{0ex}}{c}_{k2k1}>{c}_{ik1},\phantom{\rule{1em}{0ex}}i\ne k2$$ |

Now suppose that ${x}_{k1}>1-{\varepsilon}_{1}$ and ${y}_{k2}>1-{\varepsilon}_{2}$ at time ${T}_{1}$ (this must be true for some ${T}_{1}$ since convergence occurs).

$$\frac{\mathrm{d}{x}_{k1}}{\mathrm{d}t}={x}_{k1}({(B\mathbf{y})}_{k1}-{\mathbf{x}}^{\mathrm{T}}B\mathbf{y})={x}_{k1}(\sum {b}_{k1j}{y}_{j}-\sum {x}_{i}{y}_{j}{b}_{ij})={x}_{k1}({b}_{k1k2}{y}_{k2}+\sum _{j\ne k2}{b}_{k1j}{y}_{j}-{x}_{k1}{y}_{k2}{b}_{k1k2}-{x}_{k1}\sum _{j\ne k2}{y}_{j}{b}_{k1j}-\sum _{i\ne k1}{x}_{i}{y}_{k2}{b}_{ik2}-\sum _{i\ne k1,\phantom{\rule{0.25em}{0ex}}j\ne k2}{x}_{i}{y}_{j}{b}_{ij})={x}_{k1}(1-{x}_{k1})[{y}_{k2}{b}_{k1k2}+\sum _{j\ne k2}{b}_{k1j}{y}_{j}]-{x}_{k1}{y}_{k2}\sum _{i\ne k1}{x}_{i}{b}_{ik2}-{x}_{k1}\sum _{i\ne k1,\phantom{\rule{0.25em}{0ex}}j\ne k2}{x}_{i}{y}_{j}{b}_{ij}\u2a7e{x}_{k1}(1-{x}_{k1}){y}_{k2}b-{x}_{k1}M(1-{x}_{k1})(1-{y}_{k2})={x}_{k1}(1-{x}_{k1})[{y}_{k2}b-M(1-{y}_{k2})]>(1-{x}_{k1})b/2$$ |

If $v=1-{x}_{k1}$ then

$$\frac{\mathrm{d}v}{\mathrm{d}t}<-\frac{b}{2}v\Rightarrow \mathrm{log}v(t)-\mathrm{log}(v({T}_{1}))<-\frac{b}{2}(t-{T}_{1})\Rightarrow v(t)<B{\mathrm{e}}^{-bt/2}$$ |

$$w(t)<C{\mathrm{e}}^{-ct/2}$$ |

Thus

$$((1-B{\mathrm{e}}^{-bt/2})(1-C{\mathrm{e}}^{-ct/2}){b}_{k1k2})<{\mathbf{x}}^{\mathrm{T}}B\mathbf{y}<(1-B{\mathrm{e}}^{-bt/2})(1-C{\mathrm{e}}^{-ct/2}){b}_{k1k2}+\{1-(1-B{\mathrm{e}}^{-bt/2})(1-C{\mathrm{e}}^{-ct/2})\}M\Rightarrow ((1-B{\mathrm{e}}^{-bt/2})(1-C{\mathrm{e}}^{-ct/2})-1){b}_{k1k2}<{f}_{ij}(t)={\mathbf{x}}^{\mathrm{T}}B\mathbf{y}-{b}_{k1k2}<\{1-(1-B{\mathrm{e}}^{-bt/2})(1-C{\mathrm{e}}^{-ct/2})\}M\Rightarrow \frac{1}{T}\underset{0}{\overset{T}{\int}}|{f}_{ij}(t)|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t<\frac{1}{T}\underset{0}{\overset{T}{\int}}M|B{\mathrm{e}}^{-bt/2}+C{\mathrm{e}}^{-ct/2}-BC{\mathrm{e}}^{-(b+c)t/2}|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$$ |

(b) $i=j$. Suppose that **p** converges to ${\mathbf{p}}^{\ast}$ on the matrix game defined by the payoff matrix B. Suppose further, without loss of generality, that ${\mathbf{p}}^{\ast}$ is an internal ESS. Define P by

$$P=\prod _{i}{p}_{i}^{{p}_{i}^{\ast}}\phantom{\rule{1em}{0ex}}({P}^{\ast}=\prod _{i}{p}_{i}^{\ast {p}_{i}^{\ast}})$$ |

$$\frac{1}{P}\frac{\mathrm{d}P}{\mathrm{d}t}=\frac{\mathrm{d}}{\mathrm{d}t}\mathrm{log}(P)=\frac{\mathrm{d}}{\mathrm{d}t}(\sum {p}_{i}^{\ast}\mathrm{log}{p}_{i})=\sum \frac{{p}_{i}^{\ast}}{{p}_{i}}\frac{\mathrm{d}{p}_{i}}{\mathrm{d}t}=\sum {p}_{i}^{\ast}({(B\mathbf{p})}_{i}-{\mathbf{p}}^{\mathrm{T}}B\mathbf{p})$$ |

$$P=\prod {p}_{i}^{\ast {p}_{i}^{\ast}}\prod {(1+\frac{{\varepsilon}_{i}}{{p}_{i}^{\ast}})}^{{p}_{i}^{\ast}}\Rightarrow \mathrm{log}P=\mathrm{log}({P}^{\ast})+\sum {p}_{i}^{\ast}\mathrm{log}(1+{\varepsilon}_{i}/{p}_{i}^{\ast})=\mathrm{log}{P}^{\ast}+\sum {p}_{i}^{\ast}(\frac{{\varepsilon}_{i}}{{p}_{i}^{\ast}}-\frac{{\varepsilon}_{i}^{2}}{2{p}_{i}^{\ast}}+\cdots )\Rightarrow \mathrm{log}P-\mathrm{log}{P}^{\ast}\approx -\sum \frac{{\varepsilon}_{i}^{2}}{2{p}_{i}^{\ast}}\phantom{\rule{1em}{0ex}}(\sum {\varepsilon}_{i}=0)$$ |

$$\frac{\mathrm{d}}{\mathrm{d}t}\mathrm{log}P=\sum {p}_{i}^{\ast}(B{({\mathbf{p}}^{\ast}+\varepsilon )}_{i}-{({\mathbf{p}}^{\ast}+\varepsilon )}^{\mathrm{T}}B({\mathbf{p}}^{\ast}+\varepsilon ))={\mathbf{p}}^{\ast \mathrm{T}}B{\mathbf{p}}^{\ast}+{\mathbf{p}}^{\ast \mathrm{T}}B\mathit{\varepsilon}-{\mathbf{p}}^{\ast \mathrm{T}}B{\mathbf{p}}^{\ast}-{\mathbf{p}}^{\ast \mathrm{T}}B\mathit{\varepsilon}-{\mathit{\varepsilon}}^{\mathrm{T}}B{\mathbf{p}}^{\ast}-{\mathit{\varepsilon}}^{\mathrm{T}}B\mathit{\varepsilon}=-{\mathit{\varepsilon}}^{\mathrm{T}}B\mathit{\varepsilon}=\sum {b}_{ij}{\varepsilon}_{i}{\varepsilon}_{j}$$ |

${\mathbf{p}}^{\ast}$ is an internal ESS of B implies that it satisfies the negative definiteness condition of [15] so that $-{\mathit{\varepsilon}}^{\mathrm{T}}B\mathit{\varepsilon}\u2a7e\lambda \sum {\varepsilon}_{i}^{2}$ for some positive λ for any such ${\varepsilon}_{i}$s.

Thus we have

$$\frac{\mathrm{log}({P}^{\ast})-\mathrm{log}(P)}{\frac{\mathrm{d}}{\mathrm{d}t}\mathrm{log}(P)}\approx \frac{\sum ({\varepsilon}_{i}^{2}/2{p}_{i}^{\ast})}{-{\mathit{\varepsilon}}^{T}B\mathit{\varepsilon}}<\frac{\sum ({\varepsilon}_{i}^{2})/2{\mathrm{min}}_{i}({p}_{i}^{\ast})}{\lambda \sum {\varepsilon}_{i}^{2}}=\frac{1}{2\lambda \mathrm{min}{p}_{i}^{\ast}}=g$$ |

$$\mathrm{log}({P}^{\ast})-\mathrm{log}(P)<g(\frac{\mathrm{d}}{\mathrm{d}t}\mathrm{log}(P))\Rightarrow \frac{\mathrm{d}}{\mathrm{d}t}(\mathrm{log}(P){\mathrm{e}}^{t/g})>{\mathrm{e}}^{t/g}\frac{\mathrm{log}({P}^{\ast})}{g}\Rightarrow \mathrm{log}(P)\u2a7e\mathrm{log}({P}^{\ast})-k{\mathrm{e}}^{-t/g}$$ |

$$-\sum ({\varepsilon}_{i}^{2}/2{p}_{i}^{\ast})\u2a7e-k{\mathrm{e}}^{-t/g}\phantom{\rule{1em}{0ex}}\Rightarrow \phantom{\rule{1em}{0ex}}\sum {\varepsilon}_{i}^{2}\u2a7d2k{\mathrm{e}}^{-t/g}$$ |

$${a}_{ij}(t)={\mathbf{p}}^{\mathrm{T}}B\mathbf{p}={\mathbf{p}}^{\ast \mathrm{T}}B{\mathbf{p}}^{\ast}+\sum {\varepsilon}_{i}{b}_{ij}{p}_{j}+\sum {p}_{i}{b}_{ij}{\varepsilon}_{j}$$ |

$${f}_{ij}(t)=\sum {\varepsilon}_{i}{b}_{ij}{p}_{j}+\sum {p}_{i}{b}_{ij}{\varepsilon}_{j}\Rightarrow |{f}_{ij}(t)|\u2a7d2M\sum |{\varepsilon}_{i}|\u2a7dnM\sqrt{8k}{\mathrm{e}}^{-t/2g}$$ |

$$\underset{{T}_{1}}{\overset{\infty}{\int}}|{f}_{ij}(t)|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t$$ |

$$\frac{1}{T}\underset{0}{\overset{T}{\int}}|{f}_{ij}(t)|\phantom{\rule{0.2em}{0ex}}\mathrm{d}t\to 0$$ |