๐ฎ Game Theory โ Prisoner's Dilemma & Evolutionary Strategies
A grid of agents repeatedly plays the Prisoner's Dilemma. Each round, every agent plays against its 8 neighbours and earns a payoff. Then each agent adopts the strategy of its highest-scoring neighbour (with optional noise). Watch cooperation emerge, collapse, or cycle depending on the strategy mix and payoff values.
Legend
Payoff Matrix
| C | D | |
|---|---|---|
| C | R=3 | S=0 |
| D | T=5 | P=1 |
Grid & Speed
Presets
Stats
The Prisoner's Dilemma
Two agents can Cooperate (C) or Defect (D). The payoff ordering T > R > P > S and 2R > T + S makes mutual cooperation better collectively, but individual incentive pushes toward defection. This tension is the core of the dilemma.
Nash Equilibrium vs. Pareto Optimum
In a one-shot game, mutual defection (P, P) is the Nash equilibrium โ neither player can improve by switching unilaterally. Yet mutual cooperation (R, R) is Pareto optimal โ you cannot make one agent better off without hurting the other. Rational self-interest leads to a suboptimal outcome. This is the tragedy of the dilemma.
Why Tit-for-Tat wins
In Robert Axelrod's famous tournaments (1980), Tit-for-Tat (start cooperative, then mirror opponent's last move) won against 62 strategies. TFT is nice (never defects first), retaliatory (punishes defection immediately), forgiving (returns to cooperation after one retaliation), and clear (easy to read). In spatial settings TFT clusters protect it from exploitation.
Real-world applications
- Climate agreements โ nations defect (emit COโ) even though cooperation benefits all
- Arms races โ mutual disarmament is Pareto superior but unilateral disarmament is exploited
- Antibiotic resistance โ bacteria that produce common goods face defector mutants
- Price wars โ companies could cooperate on prices but individually undercut competitors