Today I learned: Do not create MongoDB replica sets with even number of nodes

TL;DR: Do not create MongoDB replica sets with even number of instances. Never, even in non-production environments.

Here’s a lesson I learned the hard way.

The MongoDB documentation clearly states that we should always have an odd number of nodes in a replica set:

Deploy an Odd Number of Members

An odd number of members ensures that the replica set is always able to elect a primary. If you have an even number of members, add an arbiter to get an odd number. Arbiters do not store a copy of the data and require fewer resources. As a result, you may run an arbiter on an application server or other shared process.

Have you ever figured why is this limitation so important, even for 2 instances?

Consider a situation with a replica set of two instances: A (priority 10) and B (priority 1).

If A and B can’t reach each other, they’ll become secondaries and remain in this state. Obviously, neither of the two will have a majority (>50% of the votes) to elect a new primary.

But here comes the subtle moment - for some reason, they can remain secondaries, even if they can reach one another again. This means that if the connection between the two instances drops for a second, the whole replica set can become unusable and unable to self-recover.

I don’t know the exact reason for this (drop me a line if you can help), but I experienced this myself with a replica set with two MongoDB 3.0.4 instances. The network was down for a moment and then the whole replica set was down until I triggered an election manually.