Today I learned: Do not create MongoDB replica sets with even number of nodes
TL;DR: Do not create MongoDB replica sets with even number of instances. Never, even in non-production environments.
Here’s a lesson I learned the hard way.
The MongoDB documentation clearly states that we should always have an odd number of nodes in a replica set:
Deploy an Odd Number of Members
An odd number of members ensures that the replica set is always able to elect a primary. If you have an even number of members, add an arbiter to get an odd number. Arbiters do not store a copy of the data and require fewer resources. As a result, you may run an arbiter on an application server or other shared process.
Have you ever figured why is this limitation so important, even for 2 instances?
Consider a situation with a replica set of two instances: A
(priority 10) and B
(priority 1).
If A
and B
can’t reach each other, they’ll become secondaries and remain in this state. Obviously, neither of the two will have a majority (>50% of the votes) to elect a new primary.
But here comes the subtle moment - for some reason, they can remain secondaries, even if they can reach one another again. This means that if the connection between the two instances drops for a second, the whole replica set can become unusable and unable to self-recover.
I don’t know the exact reason for this (drop me a line if you can help), but I experienced this myself with a replica set with two MongoDB 3.0.4 instances. The network was down for a moment and then the whole replica set was down until I triggered an election manually.