The Infinite Prisoner’s Dilemma

How Adding the Concept of Future Decisions changes the Whole Game

Almost everyone has heard of the Prisoner’s Dilemma or one of the several variations of the basic premise. It goes like this: two criminals are being interrogated separately. Each can either betray their partner or choose to remain silent. If both stay silent, each will be jailed for 1 year. If one betrays while the other remains silent, the traitor is set free while the other has to stay in prison for 3 years. Finally, if both decide to sell out each other, they will be jailed for 2 years. This can be conveniently shown by the diagram below:

The first number represents the punishment of prisoner 1, while the second of prisoner 2.

Now, no matter whether the two are allowed to discuss among themselves or not, the result will be the same: both will opt to stab the other in the back. From prisoner 1’s point of view, if prisoner 2 remains silent, they can improve their own situation by betraying their accomplice. Similarly, if the other decides to spill the beans, they are also better off if they talk. A similar thought process goes through prisoner 2’s mind, resulting in a Nash Equilibrium: a state where no player can improve their situation by changing their decision. The Prisoner’s Dilemma is a prime example where the optimum solution is different from the Nash Equilibrium.

This ends up giving a sort of dismal view to human interactions, especially business-like ones. It also made me feel rather stupid because I had chosen to keep quiet. I assuaged my wounded pride by telling myself I was a conscience-driven, loyal person, give or take a few hundred other compliments. However, as it turns out, I was right in choosing what I did because of another factor: the future.

Making Kindness Viable

Let’s take the Prisoner’s dilemma, only this time we make it endless. Of course, it doesn’t seem nice to have infinite imprisonment games, so let’s change it to kids getting candy every day, the amount depending on whether they rat the other’s mischief out or not. What happens when this game is played over and over again, eternally: do we stick with the dismal ending of insufficient sweets for the rest of time (or at least until they stop being naughty), or hope something better occurs? The answer, surprisingly, is the latter, provided each child values tomorrow’s chocolate at least 1/3rd as much as today’s.

Photo by Sara Cervera on Unsplash

The crucial difference lies in the fact that the kids can use their future treats as a bargaining chip. Upon consulting each other, they agree to remain silent, and if either betrays the other, then the victim will sell the other out without any mercy, infinitely. This provides a compelling reason to cooperate for both parties, resulting in a happy ending for our hypothetical children.

This scenario, called the Iterated Prisoner’s Dilemma, also throws light on certain aspects of evolution. A political scientist known as Dr. Axelrod had organized a contest for the best strategy to play the game, no matter what the opponent does. For example, a simple always-cooperate method loses to any ‘nasty’ player. The original contest was won by a four-line BASIC program submitted by Anatol Rapoport. The logic was simple: cooperate on the first move and then do whatever the other did last turn. This is surprisingly effective, with only one major flaw: it cannot get out of both-defect loops once they have started.

To fix this, Axelrod added a little bit of forgiveness. Sometimes, the player will cooperate even after being let down, with the exact chance depending on the lineup and number of iterations. This proved to work better than the simple tit-for-tat strategy, soundly defeating it in the number of points gotten. This relates rather interestingly with evolutionary concepts: continuous betrayal serves only to cause a lose-lose situation, but a bit of forgiveness here and there makes for a better long-term score.

Another thought-provoking observation is this: in a whole system with several players each matching up randomly — like in an ecosystem — the population of ‘nasty’ individuals will be low but not zero, because ordinary tit-for-tat strategies get along with non-retaliating kind players, who are absolutely crushed by defection-happy characters. Richard Dawkins showed that equilibrium cannot be achieved in such systems, with some amount of bouncing around always being present.

Certain more complex tactics do end up performing better than the ones I told you about, depending on the end goal and the principles used while writing the program itself, though those really do deserve an article of their own. Until then, at least you know that sometimes, the logical thing to do is to just be nice (though you probably knew that already).

Student at IIT Bombay by day, reader by heart. My linktree: