Prisoner's dilemma

has therefore become the ‘ of social psychology’, and has been extensively applied in theoretical biology, economics, and sociology during the past thirty years.

The prisoner's dilemma is a classic problem in game theory. It has the paradoxical outcome that members of a group will consciously steer towards a sub-optimal outcome in certain scenarios.

The game is usually phrased in terms of two suspects, both of whom have been arrested, and offered some reward if they confess. If only one of them confesses, this provides evidence of a major crime. In this case, the confessor is rewarded by a 5 year reduction of sentence, while the other suspect will receive no reward. If both confess, they will both have their sentences reduced by 1 year. If neither confesses, their sentences will be reduced by 3 year for a minor crime.

It is obvious that the best overall outcome for the group would be if both prisoners cooperated and stayed silent: rewards of 3 years for both prisoners. However, in the "default" setting of the prisoner's dilemma, we assume that the prisoners are not given the chance to work out such a strategy and that they are interested in their own well-being first.

Prisoner A will now analyze his options:

(The case for Prisoner B is symmetric.)
 * If Prisoner B chooses "don't confess", Prisoner A's best choice will be "confess": A gets the maximum reward of 5 years reduction.
 * If Prisoner B chooses "confess", Prisoner A's best choice will be "confess", too: 1 year reduction is better than 0.

Using this reasoning, both prisoners will choose "confess", even though it is not the best result.

The strategy "confess" is a strictly dominant strategy: the choice of Prisoner B does not change the way Prisoner A will act, and vice versa. The "confess/confess" scenario is also the only with this payout matrix. More generally, let R be the payout for mutual cooperation (i.e. not confessing); T be the reward for unilateral defection (i.e. confessing); P be the payout for mutual defection; and S be the reward (conventionally set to zero) for unilateral cooperation.

If T > R > P > S, then the Nash equilibrium of the game is mutual defection, whereas 2R > T + S makes mutual cooperation the globally best outcome.

Iterated prisoner's dilemma
The iterated prisoner's dilemma is when the basic game is played multiple times (sometimes, infinitely many times). Here, co-operation is sometimes a Nash equilibrium. This requires that each player pays attention to what the other player did on previous "rounds", and punish or reward the other player as appropriate.

In 1979, Robert Axelrod put out a call for experts in game theory and computational science to send in algorithms for playing an iterative Prisoner's Dilemma. He proposed to have all submitted algorithms compete in a tournament to see which one was the best. A total of fourteen algorithms were submitted, ranging from immensely complicated and computational intensive, to extremely simple. The results were published in the Journal of Conflict Resolution, and as it turned out, the simplest and smallest algorithm won the tournament. It was developed by Anatol Rapoport, of the University of Toronto, and it was called "tit for tat". The "tit for tat" strategy is to cooperate the first time, and then on all subsequent times, the strategy is to do whatever the opponent did on the turn prior to the one you are on. While subsequent algorithms have been developed that can best the "tit for tat" strategy, it remains the most computationally efficient. Because of this, it has been proposed as the strategy that humans employ in social interactions.

An additional strategy that is often followed and debated is the "Grim Trigger": cooperate until the first defection, and from then on out, defect every turn. Grim Trigger tends to work only when there is information exchange.

Relation to international affairs: nuclear detente
The prisoner's dilemma can be used to explain the awkward situation of exact nuclear parity. So long as a first-strike is possible, and the first-strike would eliminate the chance of retaliation, the players are in a "prisoner's dilemma", and, as noted above, are incentivized to defect.

Much of Cold War policy, then, was struggling to prevent a prisoner's dilemma situation. For example, the retention of American nuclear warheads in untraceable submarines, and the retention of Russian arms in untraceable rail cars, prevented the incentive of the first strike, and kept the parties locked in a situation where cooperation remained the best strategy.

Nuclear "escalation" could destabilize such parity. For example, the development of (Multiple Independently targetable Re-entry Vehicle) warheads briefly placed the United States ahead of the Soviet Union, but when the latter also employed MIRVs, some US hawks worried that the Soviet's ability to cram more warheads onto a missile than the US (Soviet missiles could carry greater payloads) would put the the Reds ahead of the US. Similarly, American development of an effective nuclear shield (i.e. an SDI which actually worked) would have "won" the game and potentially allowed for a US nuclear first strike without the opportunity of retaliation and was greatly feared by Russia, leading the Kremlin to contemplate SDI as a reason for a first-strike were its completion imminent. Fortunately, the Soviets quickly realized that Ronnie's orbital ray guns were pies in the sky, rather than a workable defense against ICBMs, and were happy to let Washington waste its money on this white elephant.

The "game" as studied by political scientists is displayed to the right, with the traditional values.

How people could play such a "game" when the relative survival of the human race, and most complex life on earth, was at stake, is a whole 'nother interesting question. The answer is that it's not really a "game", even though it's called "game theory". The study of these matters is of dire importance to any opposing powers before they commit to sacrificing the life of their people.