Evolutionary games with variable payoffs

Mark Broom

doi:10.1016/j.crvi.2004.12.001

Résumés

Anglais
Français

Matrix games, defined by a set of strategies and a corresponding matrix of payoffs, are commonly used to model animal populations because they are both simple and generate meaningful results. It is generally assumed that payoffs are independent of time. However, the timing of contests in real populations may have a marked effect on the value of rewards. We consider matrix games where the payoffs are functions of time. Rules are found which hold in this more general situation, and the complexity of possible behaviour is underlined by demonstrating other conditions which do not hold and an illustrative game.

Les jeux matriciels prennent en compte un ensemble de stratégies avec une matrice de gain et de coûts. Ces jeux sont fréquemment utilisés pour modéliser les populations animales parce qu'ils sont simples et génèrent des résultats dont l'interprétation est aisée. Dans ces modèles, il est habituellement supposé que les gains et les coûts sont indépendants du temps. Cependant, la durée des rencontres entre individus dans les populations réelles peut avoir un effet important sur la valeur des gains. Nous considérons des jeux matriciels pour lesquels les gains et les coûts sont fonctions du temps. Nous obtenons des règles valables dans ce cas plus général. La compléxité du comportement est souligné en recherchant d'autres conditions dans le cas non autonome et en présentant un exemple de jeu illustratif de la méthode.

1 Introduction

We start by explaining some of the basic concepts of evolutionary game theory, introduced in the classic paper [1] (see also [2]), which will be of use in the rest of the paper. In particular we discuss matrix games between symmetric players and their equivalent between asymmetric players, bimatrix games. We also use the concept of the replicator dynamic to consider the evolution of strategies as a function of time.

1.1 Matrix games

The following idea is useful for modelling a population of animals which compete in pairwise conflicts for some resource, which could be food or mates, for example. It is assumed that all members of the population are indistinguishable and each individual is equally likely to face each other individual. There are a finite number of pure strategies available to the players to play in a particular game. Let U be the set of pure strategies so that $U = {1, \dots, n}$ . Given the strategies played the outcome is determined; if player 1 plays i against player 2 playing j then player 1 receives reward $a_{i j}$ (player 2 receives $a_{j i}$ ) representing an adjustment in Darwinian fitness. The value $a_{i j}$ can be thought of as an element in the $n \times n$ matrix A, the payoff matrix.

An animal need not play the same pure strategy every time, it can play a mixed strategy, i.e., play i with probability $p_{i}$ for each of $i = 1, \dots, n$ . Thus the strategy played by an animal is represented by a probability vector p. The expected payoff to player 1 playing p against player 2 playing q, which is written as $E [p, q]$ , is

E [p, q] = \sum a_{i j} p_{i} q_{j} = p^{T} Aq

A strategy p is a Nash equilibrium if

q^{T} Ap ⩽ p^{T} Ap

for all alternative strategies q (so a strategy is a Nash equilibrium if it is a best reply to itself).

The support of p is defined as $S (p) = {i : p_{i} > 0}$ .

p is an internal strategy if $S (p) = U$ .

p is thus a Nash equilibrium if ${(A p)}_{i} = λ$ , $i \in S (p)$ , ${(A p)}_{i} ⩽ λ$ , $i \notin S (p)$ for some constant λ and is thus an internal Nash equilibrium if $S (p) = U$ and ${(A p)}_{i} = λ$ ∀i.

A strategy p is an Evolutionarily Stable Strategy (ESS) if

(i) $q^{T} Ap ⩽ p^{T} Ap$ and
(ii) if $q^{T} Ap = p^{T} Ap$ then $q^{T} Aq < p^{T} Aq$

for all alternative strategies q.

A matrix may possess a unique ESS, no ESSs or many ESSs. See [3,4] for a discussion of the possible complexity of the ESS structure of a matrix.

1.2 Bimatrix games

The assumptions underlying the bimatrix game model are the same as for the matrix game model, except that pairwise contests are fought between individuals in asymmetric positions, so that the individual designated player 1 has a different set of pure strategies $U_{1}$ to the set available to player 2 ( $U_{2}$ ). If player 1 plays its strategy i against player 2 playing its strategy j, then player 1 receives reward $a_{i j}$ and player 2 receives reward $b_{j i}$ . The payoffs combine to form the payoff matrices A and B. In the same way individuals can play mixed strategies, so that if player 1 plays p and player 2 plays q, the rewards to the players are $p^{T} A q$ and $q^{T} B p$ , respectively.

The two strategy pair $p_{1}, p_{2}$ is a Nash equilibrium pair, if

p_{1}^{T} A p_{2} ⩾ q_{1}^{T} A p_{2} and p_{2}^{T} B p_{1} ⩾ q_{2}^{T} B p_{1}

for any alternative strategies

q_{1}

q_{2}

The strategy pair $p_{1}$ , $p_{2}$ is an ESS if it is a Nash equilibrium pair,

and whenever p_{1}^{T} A p_{2} = q_{1}^{T} A p_{2} then p_{1}^{T} A q_{2} > q_{1}^{T} A q_{2}

and whenever p_{2}^{T} B p_{1} = q_{2}^{T} B p_{1} then p_{2}^{T} B q_{1} > q_{2}^{T} B q_{1}

which is only possible if both these strategies are pure [5].

1.3 Replicator dynamics

Let us assume that individuals can only play pure strategies, and let the proportion of players of pure strategy i at a particular time be $p_{i}$ $(i = 1, \dots, n)$ , so that the average population strategy (the population state) is the vector $p = (p_{i})$ , with the expected payoff (in terms of Darwinian fitness) of an i-player in such a mixture being ${(A p)}_{i}$ and the overall expected payoff in the population being $\sum p_{i} {(A p)}_{i} = p^{T} A p$ . Then the standard replicator dynamic (continuous) for the matrix game with payoff matrix A is defined by the differential equation

\frac{d p_{i}}{d t} = p_{i} [{(A p)}_{i} - p^{T} A p]

Thus the proportion of players which play the better strategies increases with time (what determines a good strategy depends upon the composition of the population). A point in n-dimensional space, represented by the vector p, is locally stable if it is an ESS (this is not necessarily true for the discrete dynamic, [6]). p is called an Evolutionarily Stable State rather than Strategy, since no individual actually plays p, it is rather the average of those strategies played. The replicator equation has been applied in very many situations (see [7,8]).

Assuming evolution under the replicator dynamic (or any other dynamic), the time average of the population state v (where $v^{T} = (v_{1}, \dots, v_{n})$ ) at time T is the vector $TA (v)$ , whose ith element $TA {(v)}_{i}$ is defined by

TA {(v)}_{i} = \frac{1}{T} \int_{0}^{T} v_{i} d t

Note that:

(1) adding a constant to any column of the payoff matrix makes no difference to the Nash equilibria, or the trajectory of the path taken by the population under the replicator dynamic (including time factors), so making no difference to the game at all.
(2) If the entire matrix is multiplied by a positive constant, then the Nash equilibria remain constant and the trajectory of the replicator dynamic is unaltered, but speed along the trajectory may be changed, and thus the time average of the population state may also be affected.

1.4 Nonconstant payoffs

The majority of game models deal with a fixed payoff structure, so that if the strategies adopted by all the combatants are known, then the payoffs are given. This corresponds with the payoff matrix of constants, A. In the real world, however, the time at which contests occur can be crucial. As the breeding season develops there is a natural variation in the rewards available for any given contest [9,10]. But due to environmental changes, variations may occur from year to year, and even day to day due to unpredictable effects such as the weather [11].

We consider a matrix game where the payoffs vary with time. In particular, if player 1 plays i against player 2 playing j at time t, then player 1 receives payoff $a_{i j} (t)$ . Clearly a lot of the concepts from the static games must now be reconsidered; for instance, p may be a Nash equilibrium of a particular constant payoff matrix (a snapshot in time) but what happens in the long term as the matrix changes? We consider how the population state changes under the replicator dynamic and ask what rules we can establish.

In general we assume that the entries in the payoff matrix are bounded, so that $a_{1} < \max | a_{i j} (t) | < a_{2}$ for all $i, j, t$ , $0 < a_{1} < a_{2}$ . It is also usually assumed that A is continuous and that indeed it cannot change too quickly, so that

| \frac{d}{d t} a_{i j} (t) | < a_{3} \forall i, j, t

The class of payoff matrices Λ is defined as follows;

L = (l_{i j}) \in Λ if and only if l_{i j} = l_{j} for all i, j

Thus Λ is the set of matrices where all elements in column j have the same value, for each column (such matrices yield the same reward to any strategy against each opposing strategy). We use the term

Λ (t)

to indicate a matrix which varies in time, but is always a member of Λ.

Finally, we assume that $a_{4} < \max | a_{i j} (t) - a_{k j} (t) |$ for some $i, j, k$ and $a_{4} > 0$ , and for all t. This ensures that the game does not become completely degenerate by bounding it away from the set Λ.

2 Results

Firstly we introduce some general results, and their consequences, before moving on to consider a particular application of the theory.

2.1 General results

In this section the variable payoff matrix with elements $a_{i j} (t)$ is simply written as A and the population state is the vector v.

Theorem 1

A converges does not imply that $TA (v)$ converges.

This results follows simply from the fact that even for constant A, not only does v not always converge, but neither does $TA (v)$ (see [8]). Complex population dynamics can thus occur even with constant payoffs. What can we say about the case where payoffs vary?

Theorem 2

If $v \to v^{*}$ , where $v (0)$ has no zero elements, then $A \to g (t) A^{*} + Λ (t)$ for some continuous function $g (t)$ and $A^{*}$ , a matrix of constants, such that $v^{*}$ is a Nash equilibrium of $A^{*}$ .

So if the prevalent population state converges, then it must be the case that the payoffs converge either to a constant payoff matrix, or one which varies in time in such a way that the ratio

\frac{a_{i 1, j 1} (t) - a_{i 2, j 2} (t)}{a_{i 3, j 3} (t) - a_{i 4, j 4} (t)}

is independent of time, for any combination of strategies

i 1, i 2, i 3, i 4, j 1, j 2, j 3, j 4

for which

j 1 = j 2

and

j 3 = j 4

. This is an extremely severe restriction implying strong environmental stability over time.

Theorem 3

(a) $a_{i j} (t) = α_{i j} + f_{i j} (t)$ and $1 / T \int_{0}^{T} | f_{i j} (t) - f_{k j} (t) | d t \to 0$ $\forall i, j, k$ and
(b) any element $v_{i} (t)$ which approaches the boundary does so sufficiently slowly so that $1 / T \log (v_{i} (T)) \to 0$ ,

then

TA (v)

converges to

v^{*}

which is a Nash equilibrium of

ℵ = (α_{i j})

If the payoffs vary with time, but to a limited extent (in particular the environmental impact is such that there is not regular displacement from the mean, except possibly of vanishingly small size), and that this is not strong enough to eliminate strategies at a sufficiently fast (exponential) rate then convergence of the time average of the population state occurs. We do not have a stronger result for a general constant payoff matrix; thus the assumption of constant payoffs can be relaxed a little without affecting the dominant long term behaviour.

Proposition 1

If $TA (A) \to ℵ$ and if $v^{*}$ is an internal Nash equilibrium of ℵ, then the distance of $v^{*}$ from $TA (v)$ , the length of the vector $v^{*} - TA (v)$ , can converge to a value which is arbitrarily close to 1.

This result demonstrates that considering the long-term mean payoffs is insufficient to find the long-term average state; indeed it is possible to find situations when considering the long-term mean payoffs would give you the worst possible estimate of the mean state.

Proposition 2

$TA (v)$ converges does not imply that we can find bounded function $0 < k_{1} < g (t) < k_{2}$ and matrix $Λ (t)$ , defined as above, such that $TA (A^{*})$ converges, where $A^{*} = (A - Λ (t)) / g (t)$ .

(Note that we need this more elaborate statement, rather than just nonconvergence of $TA (A)$ , as this can be given by the trivial case of $A = Λ (t) + K$ for constant matrix K, and suitable $Λ (t)$ ).

Even if the mean population state converges this does not imply convergence of the underlying payoffs (or the relative size of payoffs against the same pure strategy), and even time average convergence is not guaranteed. This is especially surprising given that convergence of the time average of v is not guaranteed even for constant payoffs. Thus observation of this behaviour, without convergence of v, could be the result of a wide range of underlying phenomena.

2.2 The two stage game

One application of the idea of variable payoffs is in the context of multiple stage games, even when the real natural payoffs are constant. Suppose that a population of animals compete in pairwise contests, where an animal's strategy is decided by a pair of ‘choices’. The first choice that an animal makes then decides the particular type of contest that the two animals are involved in, after which the individuals both pick a second choice, which decides the payoffs that each receive (for other examples of such contests see [12,13]). The two players make their choices at each stage simultaneously, and both know the result of the first stage before playing the second. For instance, the first ‘choice’ could be horn size, which is apparent to both players before any contest ensues. The second stage is thus a subgame within this contest; the available choices of which may be conditional upon the horn size of the participants (thus there is a payoff matrix for each pair of horn sizes).

Note that when the first stage choices are different, then this induces an asymmetry into the contest. We assume that if player 1 picks i and player 2 picks j then the reward to player 1 is decided by the payoff matrix $B_{i j}$ . If $i = j$ then we have the standard matrix game with payoff matrix $B_{i i}$ , otherwise we have a bimatrix game with payoff matrices $B_{i j}, B_{j i}$ , respectively.

The payoff $a_{i j}$ is thus the expected reward for first stage strategy i against first stage strategy j, which depends in turn upon the strategies prevalent in such contests, i.e., $a_{i j} = p_{i j}^{T} B_{i j} p_{j i}$ where $p_{i j} = {(p_{i j, k})}_{k}$ is the population state of i-players when facing j-players for the second stage of the game (and so $p_{j i}$ is the population state of their potential opponents in this contest). When considering the strategy of i against that of j, $i \neq j$ , $p_{i j}$ takes the place of $p_{1}$ and $p_{j i}$ takes the place of $p_{2}$ in the bimatrix games earlier described (for individuals of the same first stage strategy $p_{i i}$ takes the place of p in a matrix game). It is assumed that the strategies involved in this subgame evolve in the same manner as the first stage choices, and so $p_{i j}$ varies according to the replicator equation

\frac{d}{d t} p_{i j, k} = p_{i j, k} ({(B_{i j} p_{j i})}_{k} - p_{i j}^{T} B_{i j} p_{j i})

Theorem 4

If $p_{i j} \to p_{i j}^{*}$ $\forall i, j$ (where $p_{i i}^{*}$ is an ESS of $B_{i i}$ and $p_{i j}^{*}, p_{j i}^{*}$ is an ESS pair of $B_{i j}, B_{j i}$ for $i \neq j$ ), then $a_{i j} (t) = p_{i j}^{T} B_{i j} p_{j i}$ satisfies condition (a) for Theorem 3 with $α_{i j} = p_{i j}^{* T} B_{i j} p_{j i}^{*}$ .

Corollary

Suppose that

(i) $p_{i i}^{*}$ is an ESS of $B_{i i}$ and $p_{i j}^{*}, p_{j i}^{*}$ is an ESS pair of $B_{i j}, B_{j i}$ for $i \neq j$ , $\forall i, j$ .
(ii) $p_{i j} \to p_{i j}^{*}$ $\forall i, j$ .
(iii) $a_{i j} (t) = p_{i j}^{T} B_{i j} p_{i j}$ , with none of the first stage strategies approaching 0 at an exponential rate (except any not featuring in the support of $v^{*}$ below) so that condition (b) of Theorem 3 is satisfied.

Then

TA (v) = v^{*}

exists, and

v^{*}

is a Nash equilibrium of

ℵ = (α_{i j})

, where

α_{i j} = p_{i j}^{* T} B_{i j} p_{j i}^{*}

Suppose that in a two stage game there are two strategies in the first stage, and then two strategies in each of the four possible second stage situations, defined by the payoff matrices

B_{i j} = | \begin{matrix} b_{i j} (11) & b_{i j} (12) \\ b_{i j} (21) & b_{i j} (22) \end{matrix} |

The different stable solutions within each of the matrices are shown in Tables 1 and 2. The solution for the matrix

B_{i i}

(

i = 1

or 2) are given in Table 1.

Condition	$b_{i i} (11) > b_{i i} (21)$	$b_{i i} (11) < b_{i i} (21)$	$b_{i i} (11) > b_{i i} (21)$	$b_{i i} (11) < b_{i i} (21)$
	$b_{i i} (12) > b_{i i} (22)$	$b_{i i} (12) < b_{i i} (22)$	$b_{i i} (12) < b_{i i} (22)$	$b_{i i} (12) > b_{i i} (22)$
ESS(s)	(1)	(2)	(1), (2)	p
payoff	$b_{i i} (11)$	$b_{i i} (22)$	$b_{i i} (11)$ or $b_{i i} (22)$	∗

	Conditions	$b_{21} (11) > b_{21} (21)$	$b_{21} (11) > b_{21} (21)$	$b_{21} (11) < b_{21} (21)$	$b_{21} (11) < b_{21} (21)$
Conditions		$b_{21} (12) > b_{21} (22)$	$b_{21} (12) < b_{21} (22)$	$b_{21} (12) < b_{21} (22)$	$b_{21} (12) > b_{21} (22)$
$b_{12} (11) > b_{21} (21)$	$b_{12} (12) > b_{21} (22)$	(1,1)	(1,1)	(1,2)	(1,2)
$b_{12} (11) > b_{21} (21)$	$b_{12} (12) < b_{21} (22)$	(1,1)	(1,1) or (2,2)	(p,q)	(2,2)
$b_{12} (11) < b_{21} (21)$	$b_{12} (12) > b_{21} (22)$	(2,1)	(p,q)	(1,2) or (2,1)	(1,2)
$b_{12} (11) < b_{21} (21)$	$b_{12} (12) < b_{21} (22)$	(2,1)	(2,2)	(2,1)	(2,2)

When one player plays 1 and the other 2, we have the bimatrix game defined by $B_{12}, B_{21}$ . Table 2 gives the strategies in the ESSs.

When there are no pure ESSs and a single Nash equilibrium pair for the bimatrix game, it is shown in [14] that the pair $(p, q)$ is a centre and that the time averages of $p (t)$ and $q (t)$ are p and q, respectively. The time averages of the payoffs are also shown to be $p^{T} B_{12} q$ and $q^{T} B_{21} p$ , respectively. Note that in the cases where there are two pure ESS pairs, there is an internal equilibrium which is a saddle point, and all trajectories converge to one or other of the ESS pairs.

2.2.1 A numerical example

Suppose that the payoff values from the above example are

b_{11} (12) = b_{11} (21) = b_{12} (12) = b_{12} (21) = b_{21} (12) = b_{21} (21) = b_{22} (12) = b_{22} (21) = 1

b_{11} (11) = 2, b_{11} (22) = 3

b_{12} (11) = b_{12} (22) = 4

b_{21} (11) = 5, b_{21} (22) = 10

b_{22} (11) = 2, b_{22} (22) = 3

This yields two possible solutions each for

B_{11}

B_{22}

and the pair

B_{12}

B_{21}

(the maximum number in each case). Thus there are 8 possible limiting matrices A, depending upon initial conditions. Each of these has an internal ESS. The possible matrices, each with the probability of choosing first stage strategy 1 in brackets, are given below.

| \begin{matrix} 2 & 4 \\ 5 & 2 \end{matrix} | (2 / 5), | \begin{matrix} 2 & 4 \\ 10 & 2 \end{matrix} | (1 / 5)

| \begin{matrix} 2 & 4 \\ 5 & 3 \end{matrix} | (1 / 4), | \begin{matrix} 2 & 4 \\ 10 & 3 \end{matrix} | (1 / 9)

| \begin{matrix} 3 & 4 \\ 5 & 2 \end{matrix} | (1 / 2), | \begin{matrix} 3 & 4 \\ 10 & 2 \end{matrix} | (2 / 9)

| \begin{matrix} 3 & 4 \\ 5 & 3 \end{matrix} | (1 / 3), | \begin{matrix} 3 & 4 \\ 10 & 3 \end{matrix} | (1 / 8)

In effect there are eight different ESS solutions from a situation where each individual has four choices, two at each stage; the equivalent maximum number of ESSs for a matrix game with four strategies is four, so that the extra structure has made the game more complex. The reason for this is that individuals are allowed to make choices which depend in part on the choice of the opponent, so that effectively the individual has eight distinct choices.

3 Discussion

Matrix games have been used to model a variety of animal behaviours. They are structurally simple and relatively straightforward and thus amenable to analysis, but at the same time provide plausible, if simplistic, explanations for certain behaviours. However, the assumption of constant payoffs independent of time, is not always realistic. Payoffs change throughout the breeding season, and may also vary markedly from year to year. These variations can be critical to any analysis.

In this paper we have introduced the variable payoff matrix $A (t)$ to consider how different time-dependent payoffs may affect strategies. Naturally the evolution of strategies can be more complex than under constant payoffs. In particular just taking a simple time average of the rewards without recourse to the particular time that they are available can lead not just to the wrong result, but completely the opposite result to that obtained by considering the behaviour at each time separately. Similarly if regular behaviour, with a convergent time average, occurs, this does not guarantee that the underlying payoffs have a convergent time-average (or even that the relative sizes of payoffs against the same pure strategy do), so that simple observed behaviour may be concealing very complex variations in the underlying payoffs.

On the other hand, it is demonstrated that provided the variability of the payoffs is severely restricted, then the existence of a convergent time average of the population state occurs as long as there is a level of persistence of the strategies involved in the time average (they do not tend to 0 exponentially). If the actual state converges with time, then the underlying payoffs must converge in a given manner. Thus we extend our knowledge of how certain populations must behave, as well as demonstrating the complexity of the situation by showing that certain plausible statements are false.

The idea of variable payoffs is finally applied to a particular example, that of a two-stage game, where the basic payoffs are actually constant, but the rewards for the first stage (and thus the whole game) depend upon the strategies used, and so are variable. It is shown that if the population states converge in all of the second stage games then the long term time average of the state of the whole population converges provided that the first stage strategies do not tend to 0 at exponential rate (indeed convergence of the state occurs in a wide variety of cases). A numerical example with two strategies at each stage (and so four different plays in total) is given where each subgame has two ESSs (or ESS pairs), and the population converges to one of eight possible final solutions, depending upon initial conditions. This number of possible solutions is larger than for any matrix game with four strategies, the extra complication resulting from each animal being able to use information about the opponent's strategy to choose its strategy.

Appendix A

Proof of Theorem 2

\frac{d v_{i}}{d t} = v_{i} ({(A v)}_{i} - v^{T} A v)

v_{i} \to v_{i}^{*}

then

d v_{i} / d t \to 0

(since

d A / d t

is bounded), thus if

v_{i}^{*} > 0

then

{(A v)}_{i} - v^{T} A v \to 0 \Rightarrow {(A v)}_{i} - {(A v)}_{j} \to 0

\forall i, j

s.t.

v_{i}^{*} > 0

v_{j}^{*} > 0

, i.e.,

\sum_{k} (a_{i k} - a_{j k}) v_{k} \to 0 \Rightarrow \sum_{k} (a_{i k} - a_{j k}) v_{k}^{*} \to 0

v_{k}^{*} = 0

then

v_{k} (t) \to 0

but due to the boundedness of A,

v_{k} (t) > 0

for all t.

{(A v)}_{k} - v^{T} A v \to {(A v)}_{k} - {(A v)}_{j} \to {(A v^{*})}_{k} - {(A v^{*})}_{j}, j \in S (v^{*})

and so

{(A v^{*})}_{k} - {(A v^{*})}_{j} ⩽ 0

, since otherwise

v_{k} (t)

would not converge to 0.

Thus the matrix A becomes arbitrarily close to a matrix which has $v^{*}$ as a Nash equilibrium (we represent the collection of all such matrices as $A (v^{*})$ ). There are different such matrices which satisfy this, so that which of these A approaches may change with time. The $A (v^{*})$ are split into different families so that if $v^{*}$ is a Nash equilibrium of $A^{*}$ it is also a Nash equilibrium of $μ A^{*} + L$ (where $L \in Λ$ and μ is a nonnegative constant). These families only meet at the set Λ, so that given $\max | a_{i j} (t) - a_{k j} (t) | > a_{4}$ there is a given minimum distance between members of any two families. Since $d A / d t$ is bounded, then A cannot move between families without moving the population state v away from $v^{*}$ . Thus A must converge to one such family, i.e.,

A \to g (t) A^{*} + Λ (t)

for some

A^{*}

which has

v^{*}

as a Nash equilibrium, and some positive continuous function of t,

g (t)

. □

Proof of Theorem 3

\frac{d v_{i}}{d t} = v_{i} ({(A v)}_{i} - v^{T} A v) \Rightarrow \log (v_{i} (T)) - \log (v_{i} (0)) = \int_{0}^{T} {(A v)}_{i} d t - \int_{0}^{T} v^{T} A v d t \Rightarrow \log (v_{i} (T)) - \log (v_{i} (0)) - \log (v_{j} (T)) + \log (v_{j} (0)) = \int_{0}^{T} ({(A v)}_{i} - {(A v)}_{j}) d t \Rightarrow \frac{1}{T} \int_{0}^{T} ({(A v)}_{i} - {(A v)}_{j}) d t \to 0

if both

v_{i}

and

v_{j}

do not approach zero sufficiently closely that either

1 / T \log (v_{i} (T)) \to 0

1 / T \cdot \log (v_{i} (T)) \to 0

does not hold. We assume that none of the

v_{i}

s do (except any elements that are not involved in the support of

v^{*}

\frac{1}{T} \int_{0}^{T} ({(A v)}_{i} - {(A v)}_{j}) d t = \frac{1}{T} \int_{0}^{T} (\sum_{k} a_{i k} v_{k} - \sum_{k} a_{j k} v_{k}) d t = \frac{1}{T} \int_{0}^{T} \sum_{k} (a_{i k} - a_{j k}) v_{k} d t = \frac{1}{T} \int_{0}^{T} \sum_{k} (α_{i k} - α_{j k}) v_{k} d t + \frac{1}{T} \int_{0}^{T} \sum_{k} (f_{i k} - f_{j k}) v_{k} d t = \sum_{k} (α_{i k} - α_{j k}) \frac{1}{T} \int_{0}^{T} v_{k} d t + \frac{1}{T} \int_{0}^{T} \sum_{k} (f_{i k} - f_{j k}) v_{k} d t

Thus

TA (v)

is a Nash equilibrium of ℵ, i.e.,

\sum_{k} (α_{i k} - α_{j k}) \frac{1}{T} \int_{0}^{T} v_{k} d t \to 0

if and only if

\frac{1}{T} \int_{0}^{T} \sum_{k} (f_{i k} - f_{j k}) v_{k} d t \to 0

| \frac{1}{T} \int_{0}^{T} \sum_{k} (f_{i k} - f_{j k}) v_{k} d t | ⩽ \sum_{k} \frac{1}{T} \int_{0}^{T} | (f_{i k} - f_{j k}) | v_{k} d t ⩽ \sum_{k} \frac{1}{T} \int_{0}^{T} | (f_{i k} - f_{j k}) | d t

Thus if

\frac{1}{T} \int_{0}^{T} | (f_{i k} - f_{j k}) | d t \to 0

then the time average of v converges as above. □

Proof of Proposition 1

Consider the payoff matrix

| \begin{matrix} 1 & 2 \\ 2 & r (t) \end{matrix} |

where

r (t)

is a step function taking value

2 - δ

a proportion of the time λ and

2 - k

a proportion of the time

1 - λ

, where

δ > 0

is small and

k > 1

. We also assume that the length of the time spent in each state is long, so that for effectively all of the time spent in each step, the population is at (or arbitrarily close to) the unique internal equilibrium for that level. To satisfy the assumption of continuity and bounded derivative of

a_{i j}

, the steps can of course be ‘rounded’ (e.g., if the end of the step

r (t) = 2 - δ

occurs at

t_{1}

r (t_{1}) = 2 - δ, r (t_{1} + c) = 2 - k, r (t) = 1 - δ - (t - t_{1}) (k - δ) / c t_{1} < t < t_{1} + c)

with no extra consequences.

The mean of $r (t)$ is $2 - λ δ + λ k - k$ . The time average of the matrix thus gives the equilibrium value of the proportion playing strategy 1 as

v_{1} = \frac{k - k λ + λ δ}{k + 1 - λ k + λ δ}

The equilibrium value of

v_{1}

when

r (t) = 2 - δ

δ / (1 + δ)

, similarly the equilibrium when

r (t) = 2 - k

k / (1 + k)

. Thus the mean value of

v_{1}

using the equilibria from the steps is

{\bar{v}}_{1} = λ \frac{δ}{1 + δ} + (1 - λ) \frac{k}{1 + k} = \frac{k - k λ + k δ + λ δ}{(k + 1) (1 + δ)}

Setting

λ = 1 - 1 / \sqrt{k}

gives

v_{1} = \frac{\sqrt{k} + δ (1 - 1 / \sqrt{k})}{1 + \sqrt{k} + δ (1 - 1 / \sqrt{k})} \to \frac{\sqrt{k}}{1 + \sqrt{k}} (δ \to 0) \to 1 (k \to \infty)

{\bar{v}}_{1} = \frac{\sqrt{k} + δ (k + 1 - 1 / \sqrt{k})}{(k + 1) (δ + 1)} \to \frac{\sqrt{k}}{1 + k} (δ \to 0) \to 0 (k \to \infty) □

Proof of Proposition 2

Suppose that the payoff matrix alternates between

| \begin{matrix} 2 & 4 & 1 \\ 1 & 2 & 4 \\ 4 & 1 & 2 \end{matrix} |

occurring in intervals

(0, T), (3 T, 7 T), \dots, (2^{2 m} - 1) T, (2^{2 m + 1} - 1) T \dots

and

| \begin{matrix} 2 & 1 & 4 \\ 4 & 2 & 1 \\ 1 & 4 & 2 \end{matrix} |

occurring at all other times. It is easy to see that the time average of

a_{i j}

does not converge for all values of

i, j

, but that

v^{*} = (1 / 3, 1 / 3, 1 / 3)

is the internal ESS for each step, and so globally stable under the replicator dynamic [8] so that the time average of v and v itself both converge to

v^{*}

(this is clearly still true if we add any element of Λ or multiply by any function

g (t)

as defined in the proposition). Note that rounding the step function makes no difference to the convergence of

TA (v)

. □

Proof of Theorem 4

There are two cases to consider, if $i = j$ and if $i \neq j$ .

(a) $i \neq j$ . The only possible solution here is a pure pair, $k_{1}, k_{2}$ say. Supposing that the payoff matrices to the two players are B and C, respectively, then for this pair to be an ESS we require

b_{k 1 k 2} > b_{i k 2}, i \neq k 1, c_{k 2 k 1} > c_{i k 1}, i \neq k 2

Let

b = b_{k 1 k 2} - \max_{i \neq k 1} (b_{i k 2})

c = c_{k 2 k 1} - \max_{i \neq k 2} (c_{i k 1})

. Without loss of generality we can assume that

b_{i j} > 0, c_{i j} > 0

for all

i, j

. We shall denote the mean population strategies

p_{i j}, p_{j i}

p_{i j}^{T} = (x_{1}, \dots, x_{n})

p_{j i}^{T} = (y_{1}, \dots, y_{m})

Now suppose that $x_{k 1} > 1 - ɛ_{1}$ and $y_{k 2} > 1 - ɛ_{2}$ at time $T_{1}$ (this must be true for some $T_{1}$ since convergence occurs).

\frac{d x_{k 1}}{d t} = x_{k 1} ({(B y)}_{k 1} - x^{T} B y) = x_{k 1} (\sum b_{k 1 j} y_{j} - \sum x_{i} y_{j} b_{i j}) = x_{k 1} (b_{k 1 k 2} y_{k 2} + \sum_{j \neq k 2} b_{k 1 j} y_{j} - x_{k 1} y_{k 2} b_{k 1 k 2} - x_{k 1} \sum_{j \neq k 2} y_{j} b_{k 1 j} - \sum_{i \neq k 1} x_{i} y_{k 2} b_{i k 2} - \sum_{i \neq k 1, j \neq k 2} x_{i} y_{j} b_{i j}) = x_{k 1} (1 - x_{k 1}) [y_{k 2} b_{k 1 k 2} + \sum_{j \neq k 2} b_{k 1 j} y_{j}] - x_{k 1} y_{k 2} \sum_{i \neq k 1} x_{i} b_{i k 2} - x_{k 1} \sum_{i \neq k 1, j \neq k 2} x_{i} y_{j} b_{i j} ⩾ x_{k 1} (1 - x_{k 1}) y_{k 2} b - x_{k 1} M (1 - x_{k 1}) (1 - y_{k 2}) = x_{k 1} (1 - x_{k 1}) [y_{k 2} b - M (1 - y_{k 2})] > (1 - x_{k 1}) b / 2

where

M = \max_{i, j} b_{i j}

If $v = 1 - x_{k 1}$ then

\frac{d v}{d t} < - \frac{b}{2} v \Rightarrow \log v (t) - \log (v (T_{1})) < - \frac{b}{2} (t - T_{1}) \Rightarrow v (t) < B e^{- b t / 2}

Similarly setting

w = 1 - y_{k 2}

we can show that

w (t) < C e^{- c t / 2}

Thus

((1 - B e^{- b t / 2}) (1 - C e^{- c t / 2}) b_{k 1 k 2}) < x^{T} B y < (1 - B e^{- b t / 2}) (1 - C e^{- c t / 2}) b_{k 1 k 2} + {1 - (1 - B e^{- b t / 2}) (1 - C e^{- c t / 2})} M \Rightarrow ((1 - B e^{- b t / 2}) (1 - C e^{- c t / 2}) - 1) b_{k 1 k 2} < f_{i j} (t) = x^{T} B y - b_{k 1 k 2} < {1 - (1 - B e^{- b t / 2}) (1 - C e^{- c t / 2})} M \Rightarrow \frac{1}{T} \int_{0}^{T} | f_{i j} (t) | d t < \frac{1}{T} \int_{0}^{T} M | B e^{- b t / 2} + C e^{- c t / 2} - B C e^{- (b + c) t / 2} | d t \to 0

T \to \infty

, i.e., all off-diagonal elements satisfy Theorem 3.

(b) $i = j$ . Suppose that p converges to $p^{*}$ on the matrix game defined by the payoff matrix B. Suppose further, without loss of generality, that $p^{*}$ is an internal ESS. Define P by

P = \prod_{i} p_{i}^{p_{i}^{*}} (P^{*} = \prod_{i} p_{i}^{* p_{i}^{*}})

Thus

\frac{1}{P} \frac{d P}{d t} = \frac{d}{d t} \log (P) = \frac{d}{d t} (\sum p_{i}^{*} \log p_{i}) = \sum \frac{p_{i}^{*}}{p_{i}} \frac{d p_{i}}{d t} = \sum p_{i}^{*} ({(B p)}_{i} - p^{T} B p)

Let

p_{i} = p_{i}^{*} + ɛ_{i}

, so for sufficiently large t,

ɛ_{i}

will be as small as we like for all i. Thus

P = \prod p_{i}^{* p_{i}^{*}} \prod {(1 + \frac{ɛ_{i}}{p_{i}^{*}})}^{p_{i}^{*}} \Rightarrow \log P = \log (P^{*}) + \sum p_{i}^{*} \log (1 + ɛ_{i} / p_{i}^{*}) = \log P^{*} + \sum p_{i}^{*} (\frac{ɛ_{i}}{p_{i}^{*}} - \frac{ɛ_{i}^{2}}{2 p_{i}^{*}} + \dots) \Rightarrow \log P - \log P^{*} \approx - \sum \frac{ɛ_{i}^{2}}{2 p_{i}^{*}} (\sum ɛ_{i} = 0)

setting

ɛ^{T} = (ɛ_{1}, \dots, ɛ_{n})

\frac{d}{d t} \log P = \sum p_{i}^{*} (B {(p^{*} + ɛ)}_{i} - {(p^{*} + ɛ)}^{T} B (p^{*} + ɛ)) = p^{* T} B p^{*} + p^{* T} B ɛ - p^{* T} B p^{*} - p^{* T} B ɛ - ɛ^{T} B p^{*} - ɛ^{T} B ɛ = - ɛ^{T} B ɛ = \sum b_{i j} ɛ_{i} ɛ_{j}

(since

{(B p^{*})}_{i}

is constant and

\sum ɛ_{i} = 0

$p^{*}$ is an internal ESS of B implies that it satisfies the negative definiteness condition of [15] so that $- ɛ^{T} B ɛ ⩾ λ \sum ɛ_{i}^{2}$ for some positive λ for any such $ɛ_{i}$ s.

Thus we have

\frac{\log (P^{*}) - \log (P)}{\frac{d}{d t} \log (P)} \approx \frac{\sum (ɛ_{i}^{2} / 2 p_{i}^{*})}{- ɛ^{T} B ɛ} < \frac{\sum (ɛ_{i}^{2}) / 2 \min_{i} (p_{i}^{*})}{λ \sum ɛ_{i}^{2}} = \frac{1}{2 λ \min p_{i}^{*}} = g

Thus

\log (P^{*}) - \log (P) < g (\frac{d}{d t} \log (P)) \Rightarrow \frac{d}{d t} (\log (P) e^{t / g}) > e^{t / g} \frac{\log (P^{*})}{g} \Rightarrow \log (P) ⩾ \log (P^{*}) - k e^{- t / g}

for some positive constant k. Thus

- \sum (ɛ_{i}^{2} / 2 p_{i}^{*}) ⩾ - k e^{- t / g} \Rightarrow \sum ɛ_{i}^{2} ⩽ 2 k e^{- t / g}

a_{i j} (t) = p^{T} B p = p^{* T} B p^{*} + \sum ɛ_{i} b_{i j} p_{j} + \sum p_{i} b_{i j} ɛ_{j}

so that

f_{i j} (t) = \sum ɛ_{i} b_{i j} p_{j} + \sum p_{i} b_{i j} ɛ_{j} \Rightarrow | f_{i j} (t) | ⩽ 2 M \sum | ɛ_{i} | ⩽ n M \sqrt{8 k} e^{- t / 2 g}

where n is the larger dimension of B. Thus

\int_{T_{1}}^{\infty} | f_{i j} (t) | d t

is finite and so

\frac{1}{T} \int_{0}^{T} | f_{i j} (t) | d t \to 0

thus satisfying the conditions of Theorem 3. □