1 Introduction
We start by explaining some of the basic concepts of evolutionary game theory, introduced in the classic paper [1] (see also [2]), which will be of use in the rest of the paper. In particular we discuss matrix games between symmetric players and their equivalent between asymmetric players, bimatrix games. We also use the concept of the replicator dynamic to consider the evolution of strategies as a function of time.
1.1 Matrix games
The following idea is useful for modelling a population of animals which compete in pairwise conflicts for some resource, which could be food or mates, for example. It is assumed that all members of the population are indistinguishable and each individual is equally likely to face each other individual. There are a finite number of pure strategies available to the players to play in a particular game. Let U be the set of pure strategies so that . Given the strategies played the outcome is determined; if player 1 plays i against player 2 playing j then player 1 receives reward (player 2 receives ) representing an adjustment in Darwinian fitness. The value can be thought of as an element in the matrix A, the payoff matrix.
An animal need not play the same pure strategy every time, it can play a mixed strategy, i.e., play i with probability for each of . Thus the strategy played by an animal is represented by a probability vector p. The expected payoff to player 1 playing p against player 2 playing q, which is written as , is
The support of p is defined as .
p is an internal strategy if .
p is thus a Nash equilibrium if , , , for some constant λ and is thus an internal Nash equilibrium if and ∀i.
A strategy p is an Evolutionarily Stable Strategy (ESS) if
- (i) and
- (ii) if then
A matrix may possess a unique ESS, no ESSs or many ESSs. See [3,4] for a discussion of the possible complexity of the ESS structure of a matrix.
1.2 Bimatrix games
The assumptions underlying the bimatrix game model are the same as for the matrix game model, except that pairwise contests are fought between individuals in asymmetric positions, so that the individual designated player 1 has a different set of pure strategies to the set available to player 2 (). If player 1 plays its strategy i against player 2 playing its strategy j, then player 1 receives reward and player 2 receives reward . The payoffs combine to form the payoff matrices A and B. In the same way individuals can play mixed strategies, so that if player 1 plays p and player 2 plays q, the rewards to the players are and , respectively.
The two strategy pair is a Nash equilibrium pair, if
The strategy pair , is an ESS if it is a Nash equilibrium pair,
1.3 Replicator dynamics
Let us assume that individuals can only play pure strategies, and let the proportion of players of pure strategy i at a particular time be , so that the average population strategy (the population state) is the vector , with the expected payoff (in terms of Darwinian fitness) of an i-player in such a mixture being and the overall expected payoff in the population being . Then the standard replicator dynamic (continuous) for the matrix game with payoff matrix A is defined by the differential equation
Assuming evolution under the replicator dynamic (or any other dynamic), the time average of the population state v (where ) at time T is the vector , whose ith element is defined by
Note that:
- (1) adding a constant to any column of the payoff matrix makes no difference to the Nash equilibria, or the trajectory of the path taken by the population under the replicator dynamic (including time factors), so making no difference to the game at all.
- (2) If the entire matrix is multiplied by a positive constant, then the Nash equilibria remain constant and the trajectory of the replicator dynamic is unaltered, but speed along the trajectory may be changed, and thus the time average of the population state may also be affected.
1.4 Nonconstant payoffs
The majority of game models deal with a fixed payoff structure, so that if the strategies adopted by all the combatants are known, then the payoffs are given. This corresponds with the payoff matrix of constants, A. In the real world, however, the time at which contests occur can be crucial. As the breeding season develops there is a natural variation in the rewards available for any given contest [9,10]. But due to environmental changes, variations may occur from year to year, and even day to day due to unpredictable effects such as the weather [11].
We consider a matrix game where the payoffs vary with time. In particular, if player 1 plays i against player 2 playing j at time t, then player 1 receives payoff . Clearly a lot of the concepts from the static games must now be reconsidered; for instance, p may be a Nash equilibrium of a particular constant payoff matrix (a snapshot in time) but what happens in the long term as the matrix changes? We consider how the population state changes under the replicator dynamic and ask what rules we can establish.
In general we assume that the entries in the payoff matrix are bounded, so that for all , . It is also usually assumed that A is continuous and that indeed it cannot change too quickly, so that
Finally, we assume that for some and , and for all t. This ensures that the game does not become completely degenerate by bounding it away from the set Λ.
2 Results
Firstly we introduce some general results, and their consequences, before moving on to consider a particular application of the theory.
2.1 General results
In this section the variable payoff matrix with elements is simply written as A and the population state is the vector v.
A converges does not imply that
converges.
Theorem 1
This results follows simply from the fact that even for constant A, not only does v not always converge, but neither does (see [8]). Complex population dynamics can thus occur even with constant payoffs. What can we say about the case where payoffs vary?
If
, where
has no zero elements, then
for some continuous function
and
, a matrix of constants, such that
is a Nash equilibrium of
.
Theorem 2
So if the prevalent population state converges, then it must be the case that the payoffs converge either to a constant payoff matrix, or one which varies in time in such a way that the ratio
If
Theorem 3
then
converges to
which is a Nash equilibrium of
.
If the payoffs vary with time, but to a limited extent (in particular the environmental impact is such that there is not regular displacement from the mean, except possibly of vanishingly small size), and that this is not strong enough to eliminate strategies at a sufficiently fast (exponential) rate then convergence of the time average of the population state occurs. We do not have a stronger result for a general constant payoff matrix; thus the assumption of constant payoffs can be relaxed a little without affecting the dominant long term behaviour.
Ifand ifis an internal Nash equilibrium of ℵ, then the distance offrom, the length of the vector, can converge to a value which is arbitrarily close to 1.Proposition 1
This result demonstrates that considering the long-term mean payoffs is insufficient to find the long-term average state; indeed it is possible to find situations when considering the long-term mean payoffs would give you the worst possible estimate of the mean state.
converges does not imply that we can find bounded function
and matrix
, defined as above, such that
converges, where
.
Proposition 2
(Note that we need this more elaborate statement, rather than just nonconvergence of , as this can be given by the trivial case of for constant matrix K, and suitable ).
Even if the mean population state converges this does not imply convergence of the underlying payoffs (or the relative size of payoffs against the same pure strategy), and even time average convergence is not guaranteed. This is especially surprising given that convergence of the time average of v is not guaranteed even for constant payoffs. Thus observation of this behaviour, without convergence of v, could be the result of a wide range of underlying phenomena.
2.2 The two stage game
One application of the idea of variable payoffs is in the context of multiple stage games, even when the real natural payoffs are constant. Suppose that a population of animals compete in pairwise contests, where an animal's strategy is decided by a pair of ‘choices’. The first choice that an animal makes then decides the particular type of contest that the two animals are involved in, after which the individuals both pick a second choice, which decides the payoffs that each receive (for other examples of such contests see [12,13]). The two players make their choices at each stage simultaneously, and both know the result of the first stage before playing the second. For instance, the first ‘choice’ could be horn size, which is apparent to both players before any contest ensues. The second stage is thus a subgame within this contest; the available choices of which may be conditional upon the horn size of the participants (thus there is a payoff matrix for each pair of horn sizes).
Note that when the first stage choices are different, then this induces an asymmetry into the contest. We assume that if player 1 picks i and player 2 picks j then the reward to player 1 is decided by the payoff matrix . If then we have the standard matrix game with payoff matrix , otherwise we have a bimatrix game with payoff matrices , respectively.
The payoff is thus the expected reward for first stage strategy i against first stage strategy j, which depends in turn upon the strategies prevalent in such contests, i.e., where is the population state of i-players when facing j-players for the second stage of the game (and so is the population state of their potential opponents in this contest). When considering the strategy of i against that of j, , takes the place of and takes the place of in the bimatrix games earlier described (for individuals of the same first stage strategy takes the place of p in a matrix game). It is assumed that the strategies involved in this subgame evolve in the same manner as the first stage choices, and so varies according to the replicator equation
If (where is an ESS of and is an ESS pair of for ), then satisfies condition (a) for Theorem 3 with .Theorem 4
Suppose that
Corollary
Then
exists, and
is a Nash equilibrium of
, where
.
Suppose that in a two stage game there are two strategies in the first stage, and then two strategies in each of the four possible second stage situations, defined by the payoff matrices
ESSs and payoffs for
Condition | ||||
ESS(s) | (1) | (2) | (1), (2) | p |
payoff | or | ∗ |
ESSs for and
Conditions | |||||
Conditions | |||||
(1,1) | (1,1) | (1,2) | (1,2) | ||
(1,1) | (1,1) or (2,2) | (p,q) | (2,2) | ||
(2,1) | (p,q) | (1,2) or (2,1) | (1,2) | ||
(2,1) | (2,2) | (2,1) | (2,2) |
When one player plays 1 and the other 2, we have the bimatrix game defined by . Table 2 gives the strategies in the ESSs.
When there are no pure ESSs and a single Nash equilibrium pair for the bimatrix game, it is shown in [14] that the pair is a centre and that the time averages of and are p and q, respectively. The time averages of the payoffs are also shown to be and , respectively. Note that in the cases where there are two pure ESS pairs, there is an internal equilibrium which is a saddle point, and all trajectories converge to one or other of the ESS pairs.
2.2.1 A numerical example
Suppose that the payoff values from the above example are
3 Discussion
Matrix games have been used to model a variety of animal behaviours. They are structurally simple and relatively straightforward and thus amenable to analysis, but at the same time provide plausible, if simplistic, explanations for certain behaviours. However, the assumption of constant payoffs independent of time, is not always realistic. Payoffs change throughout the breeding season, and may also vary markedly from year to year. These variations can be critical to any analysis.
In this paper we have introduced the variable payoff matrix to consider how different time-dependent payoffs may affect strategies. Naturally the evolution of strategies can be more complex than under constant payoffs. In particular just taking a simple time average of the rewards without recourse to the particular time that they are available can lead not just to the wrong result, but completely the opposite result to that obtained by considering the behaviour at each time separately. Similarly if regular behaviour, with a convergent time average, occurs, this does not guarantee that the underlying payoffs have a convergent time-average (or even that the relative sizes of payoffs against the same pure strategy do), so that simple observed behaviour may be concealing very complex variations in the underlying payoffs.
On the other hand, it is demonstrated that provided the variability of the payoffs is severely restricted, then the existence of a convergent time average of the population state occurs as long as there is a level of persistence of the strategies involved in the time average (they do not tend to 0 exponentially). If the actual state converges with time, then the underlying payoffs must converge in a given manner. Thus we extend our knowledge of how certain populations must behave, as well as demonstrating the complexity of the situation by showing that certain plausible statements are false.
The idea of variable payoffs is finally applied to a particular example, that of a two-stage game, where the basic payoffs are actually constant, but the rewards for the first stage (and thus the whole game) depend upon the strategies used, and so are variable. It is shown that if the population states converge in all of the second stage games then the long term time average of the state of the whole population converges provided that the first stage strategies do not tend to 0 at exponential rate (indeed convergence of the state occurs in a wide variety of cases). A numerical example with two strategies at each stage (and so four different plays in total) is given where each subgame has two ESSs (or ESS pairs), and the population converges to one of eight possible final solutions, depending upon initial conditions. This number of possible solutions is larger than for any matrix game with four strategies, the extra complication resulting from each animal being able to use information about the opponent's strategy to choose its strategy.
Appendix A
Proof of Theorem 2
If then (since is bounded), thus if then s.t. , , i.e.,
If then but due to the boundedness of A, for all t.
and so , since otherwise would not converge to 0.
Thus the matrix A becomes arbitrarily close to a matrix which has as a Nash equilibrium (we represent the collection of all such matrices as ). There are different such matrices which satisfy this, so that which of these A approaches may change with time. The are split into different families so that if is a Nash equilibrium of it is also a Nash equilibrium of (where and μ is a nonnegative constant). These families only meet at the set Λ, so that given there is a given minimum distance between members of any two families. Since is bounded, then A cannot move between families without moving the population state v away from . Thus A must converge to one such family, i.e.,
Proof of Theorem 3
if both and do not approach zero sufficiently closely that either or does not hold. We assume that none of the s do (except any elements that are not involved in the support of ).
Thus is a Nash equilibrium of ℵ, i.e.,
if and only if
Thus if
then the time average of v converges as above. □
Consider the payoff matrix Proof of Proposition 1
where is a step function taking value a proportion of the time λ and a proportion of the time , where is small and . We also assume that the length of the time spent in each state is long, so that for effectively all of the time spent in each step, the population is at (or arbitrarily close to) the unique internal equilibrium for that level. To satisfy the assumption of continuity and bounded derivative of , the steps can of course be ‘rounded’ (e.g., if the end of the step occurs at , with no extra consequences.
The mean of is . The time average of the matrix thus gives the equilibrium value of the proportion playing strategy 1 as
Suppose that the payoff matrix alternates between Proof of Proposition 2
occurring in intervals and
occurring at all other times. It is easy to see that the time average of does not converge for all values of , but that is the internal ESS for each step, and so globally stable under the replicator dynamic [8] so that the time average of v and v itself both converge to (this is clearly still true if we add any element of Λ or multiply by any function as defined in the proposition). Note that rounding the step function makes no difference to the convergence of . □
There are two cases to consider, if and if . (a) . The only possible solution here is a pure pair, say. Supposing that the payoff matrices to the two players are B and C, respectively, then for this pair to be an ESS we require Proof of Theorem 4
Let , . Without loss of generality we can assume that for all . We shall denote the mean population strategies by , .
Now suppose that and at time (this must be true for some since convergence occurs).
If then
Thus
(b) . Suppose that p converges to on the matrix game defined by the payoff matrix B. Suppose further, without loss of generality, that is an internal ESS. Define P by
is an internal ESS of B implies that it satisfies the negative definiteness condition of [15] so that for some positive λ for any such s.
Thus we have