When you stepDown the Primary, all the eligible Secondaries will hold an election for a new Primary. Until a new Primary is elected the database is not available for writes. So, ‘How would you quickly elect a new Primary while performing the rolling maintenance/upgrade?’
Mastering — Rolling maintenance
- Stop the MongoDB process/service on a Secondary
- Perform the required maintenance/upgrade on the server
- Start the MongoDB process/service on the server
- Wait for MongoDB on the server to catch up on the Oplog
- Repeat the above on the other secondaries in the replica set
- Perform the maintenance on the Secondary server, mon03
- Perform the maintenance on the other Secondary server, mon02
- stepDown the Primary server, mon01
- Wait for new Primary to be elected, let’s say mon02
- Perform the maintenance on the former primary server, mon01
Implications of not having a Primary
How to quickly elect a new Primary
StepDown the Primary
Make only one Secondary to be electable
- Low replication lag
- Low network latency
- Similar Priority as current Primary
- Or Member with next highest Priority
Reduce the settings.electionTimeoutMillis
Summary of steps for faster election
- Identify the server you want it to be next Primary
- Execute rs.freeze(60) on all other Secondaries
- Set settings.electionTimeoutMillis=2000 on replica set configuration
- Execute rs.stepDown() on current Primary
- Wait for the new Primary to be elected
- Reset the settings.electionTimeoutMillis=10000 on the new Primary
Pros & Cons of the approach
“If having lower electionTimeoutMillis helps with quicker elections, then why can’t I keep it at lower number all the time?”
No matter what you do, “Never set the electionTimeoutMillis to a value less than the round-trip network latency time between two of your members.”
Hands-On lab exercises
Setup environment
# Run these commands on all the 3 servers of yours.
# download v3.6
curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-rhel70-3.6.5.tgz
BIN_NAME="mongodb-linux-x86_64-rhel70-3.6.5"
BIN_VERSION="v3.6.5"
# create data directory and untar the binaries
tar -xvzf "$BIN_NAME.tgz"
rm "$BIN_NAME.tgz"
mv $BIN_NAME $BIN_VERSION
rm -rf data
mkdir data
# start the mongod on port 27000 with bind_ip_all
$BIN_VERSION/bin/mongod --dbpath data --logpath data/mongod.log --fork --replSet rs0 --port 27000 --bind_ip_all
# about to fork child process, waiting until server is ready for connections.
# forked process: 13442
# child process started successfully, parent exiting
# edit /etc/hosts to have the mon0X entries, ofcourse with your own IPs
tail -3 /etc/hosts
# 35.167.113.204 mon01
# 35.167.113.203 mon02
# 35.167.113.206 mon03
Initiate replica set
$BIN_VERSION/bin/mongo --port 27000 <<EOF
rs.initiate({
_id: 'rs0',
members: [
{ _id: 0, host : 'mon01:27000' },
{ _id: 1, host : 'mon02:27000' },
{ _id: 2, host : 'mon03:27000' }
] })
EOF
# MongoDB shell version v3.6.5
# connecting to: mongodb://127.0.0.1:27000/
# MongoDB server version: 3.6.5
# {
# "ok" : 1,
# "operationTime" : Timestamp(1529022819, 1),
# "$clusterTime" : {
# "clusterTime" : Timestamp(1529022819, 1),
# "signature" : {
# "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
# "keyId" : NumberLong(0)
# }
# }
# }
# bye
$BIN_VERSION/bin/mongo --port 27000
# rs0:PRIMARY>
rs.isMaster().me
# mon01:27000
Display the replica set config and status
rs.config()
/*
{
"_id": "rs0",
"version": 1,
"protocolVersion": NumberLong(1),
"members": [{
"_id": 0,
"host": "mon01:27000",
"arbiterOnly": false,
"buildIndexes": true,
"hidden": false,
"priority": 1,
"tags": {
},
"slaveDelay": NumberLong(0),
"votes": 1
},
{
"_id": 1,
"host": "mon02:27000",
"arbiterOnly": false,
"buildIndexes": true,
"hidden": false,
"priority": 1,
"tags": {
},
"slaveDelay": NumberLong(0),
"votes": 1
},
{
"_id": 2,
"host": "mon03:27000",
"arbiterOnly": false,
"buildIndexes": true,
"hidden": false,
"priority": 1,
"tags": {
},
"slaveDelay": NumberLong(0),
"votes": 1
}
],
"settings": {
"chainingAllowed": true,
"heartbeatIntervalMillis": 2000,
"heartbeatTimeoutSecs": 10,
"electionTimeoutMillis": 10000,
"catchUpTimeoutMillis": -1,
"catchUpTakeoverDelayMillis": 30000,
"getLastErrorModes": {
},
"getLastErrorDefaults": {
"w": 1,
"wtimeout": 0
},
"replicaSetId": ObjectId("5b23096362ac76fdc504e6e1")
}
}
*/
rs.status()
/*
{
"set": "rs0",
"date": ISODate("2018-06-15T00:57:24.708Z"),
"myState": 1,
"term": NumberLong(1),
"heartbeatIntervalMillis": NumberLong(2000),
"optimes": {
"lastCommittedOpTime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"readConcernMajorityOpTime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"appliedOpTime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"durableOpTime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
}
},
"members": [{
"_id": 0,
"name": "mon01:27000",
"health": 1,
"state": 1,
"stateStr": "PRIMARY",
"uptime": 1837,
"optime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"optimeDate": ISODate("2018-06-15T00:57:21Z"),
"electionTime": Timestamp(1529022829, 1),
"electionDate": ISODate("2018-06-15T00:33:49Z"),
"configVersion": 1,
"self": true
},
{
"_id": 1,
"name": "mon02:27000",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 1425,
"optime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"optimeDurable": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"optimeDate": ISODate("2018-06-15T00:57:21Z"),
"optimeDurableDate": ISODate("2018-06-15T00:57:21Z"),
"lastHeartbeat": ISODate("2018-06-15T00:57:24.663Z"),
"lastHeartbeatRecv": ISODate("2018-06-15T00:57:23.502Z"),
"pingMs": NumberLong(0),
"syncingTo": "mon01:27000",
"configVersion": 1
},
{
"_id": 2,
"name": "mon03:27000",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 1425,
"optime": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"optimeDurable": {
"ts": Timestamp(1529024241, 1),
"t": NumberLong(1)
},
"optimeDate": ISODate("2018-06-15T00:57:21Z"),
"optimeDurableDate": ISODate("2018-06-15T00:57:21Z"),
"lastHeartbeat": ISODate("2018-06-15T00:57:24.571Z"),
"lastHeartbeatRecv": ISODate("2018-06-15T00:57:23.075Z"),
"pingMs": NumberLong(75),
"syncingTo": "mon01:27000",
"configVersion": 1
}
],
"ok": 1,
"operationTime": Timestamp(1529024241, 1),
"$clusterTime": {
"clusterTime": Timestamp(1529024241, 1),
"signature": {
"hash": BinData(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId": NumberLong(0)
}
}
}
*/
Choose the potential next Primary
db.printSlaveReplicationInfo()
/*
source: mon02:27000
syncedTo: Fri Jun 15 2018 01:29:11 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary
source: mon03:27000
syncedTo: Fri Jun 15 2018 01:29:11 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary
*/
rs.status().members.map(x=>x.pingMs)
// [ undefined, NumberLong(0), NumberLong(78) ]
Freeze the other Secondaries
// Freeze mon03
rs.isMaster().me
// mon03:27000
rs.freeze(60)
/*
{
"ok" : 1,
"operationTime" : Timestamp(1529033711, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1529033711, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
*/
Set electionTimeoutMillis and stepDown the Primary
// # rs0:PRIMARY>
rs.isMaster().me
// mon01:27000
var conf = rs.conf()
conf.settings.electionTimeoutMillis=2000
rs.reconfig(conf)
/*
{
"ok": 1,
"operationTime": Timestamp(1529025863, 1),
"$clusterTime": {
"clusterTime": Timestamp(1529025863, 1),
"signature": {
"hash": BinData(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId": NumberLong(0)
}
}
}
*/
rs.stepDown()
/*
2018-06-15T03:52:42.042+0000 E QUERY [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '127.0.0.1:27000' :
DB.prototype.runCommand@src/mongo/shell/db.js:168:1
DB.prototype.adminCommand@src/mongo/shell/db.js:186:16
rs.stepDown@src/mongo/shell/utils.js:1341:12
@(shell):1:1
2018-06-15T03:52:42.043+0000 I NETWORK [thread1] trying reconnect to 127.0.0.1:27000 (127.0.0.1) failed
2018-06-15T03:52:42.043+0000 I NETWORK [thread1] reconnect 127.0.0.1:27000 (127.0.0.1) ok
*/
// rs0:SECONDARY>
rs.isMaster().primary
// mon02:27000
# Server mon02
tail -100 data/mongod.log | grep REPL
# 2018-06-15T03:52:42.304+0000 I REPL [rsBackgroundSync] could not find member to sync from
# 2018-06-15T03:52:42.305+0000 I REPL [replexec-28] Member mon01:27000 is now in state SECONDARY
# 2018-06-15T03:52:43.306+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to mon01:27000: InvalidSyncSource: Sync source was cleared. Was mon01:27000
# 2018-06-15T03:52:43.360+0000 I REPL [replexec-26] Starting an election, since weve seen no PRIMARY in the past 2000ms
# 2018-06-15T03:52:43.360+0000 I REPL [replexec-26] conducting a dry run election to see if we could be elected. current term: 4
# 2018-06-15T03:52:43.360+0000 I REPL [replexec-24] VoteRequester(term 4 dry run) received a yes vote from mon01:27000; response message: { term: 4, voteGranted: true, reason: "", ok: 1.0, operationTime: Timestamp(1529034758, 1), $clusterTime: { clusterTime: Timestamp(1529034758, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }
# 2018-06-15T03:52:43.360+0000 I REPL [replexec-24] dry election run succeeded, running for election in term 5
# 2018-06-15T03:52:43.363+0000 I REPL [replexec-28] VoteRequester(term 5) received a yes vote from mon01:27000; response message: { term: 5, voteGranted: true, reason: "", ok: 1.0, operationTime: Timestamp(1529034758, 1), $clusterTime: { clusterTime: Timestamp(1529034758, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }
# 2018-06-15T03:52:43.363+0000 I REPL [replexec-28] election succeeded, assuming primary role in term 5
# 2018-06-15T03:52:43.363+0000 I REPL [replexec-28] transition to PRIMARY from SECONDARY
# 2018-06-15T03:52:43.363+0000 I REPL [replexec-28] Entering primary catch-up mode.
# 2018-06-15T03:52:43.584+0000 I REPL [replexec-23] Caught up to the latest optime known via heartbeats after becoming primary.
# 2018-06-15T03:52:43.584+0000 I REPL [replexec-23] Exited primary catch-up mode.
# 2018-06-15T03:52:45.301+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted
# Server mon03
tail -100 data/mongod.log | grep REPL
# 2018-06-15T03:52:29.029+0000 I REPL [rsBackgroundSync] sync source candidate: mon01:27000
# 2018-06-15T03:52:31.631+0000 I REPL [conn27] 'freezing' for 120 seconds
# 2018-06-15T03:52:42.794+0000 I REPL [replication-1] Choosing new sync source because our current sync source, mon01:27000, has an OpTime ({ ts: Timestamp(1529034758, 1), t: 4 }) which is not ahead of ours ({ ts: Timestamp(1529034758, 1), t: 4 }), it does not have a sync source, and its not the primary (sync source does not know the primary)
# 2018-06-15T03:52:42.794+0000 I REPL [replication-1] Canceling oplog query due to OplogQueryMetadata. We have to choose a new sync source. Current source: mon01:27000, OpTime { ts: Timestamp(1529034758, 1), t: 4 }, its sync source index:-1
# 2018-06-15T03:52:42.794+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source mon01:27000 (config version: 4; last applied optime: { ts: Timestamp(1529034758, 1), t: 4 }; sync source index: -1; primary index: -1) is no longer valid
# 2018-06-15T03:52:42.794+0000 I REPL [rsBackgroundSync] could not find member to sync from
# 2018-06-15T03:52:43.420+0000 I REPL [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I am still waiting for stepdown period to end at 2018-06-09T13:09:29.408+0000 (mask 0x20)
# 2018-06-15T03:52:43.481+0000 I REPL [ReplicationExecutor] Member mon01:27000 is now in state SECONDARY
# 2018-06-15T03:52:43.584+0000 I REPL [ReplicationExecutor] Not starting an election, since we are not electable due to: Not standing for election because I am still waiting for stepdown period to end at 2018-06-09T13:09:29.408+0000 (mask 0x20)
# 2018-06-15T03:52:43.584+0000 I REPL [ReplicationExecutor] Member mon02:27000 is now in state PRIMARY
# 2018-06-15T03:52:43.663+0000 I REPL [replexec-31] Member mon02:27000 is now in state PRIMARY
# 2018-06-15T03:52:43.817+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to mon01:27000: InvalidSyncSource: Sync source was cleared. Was mon01:27000
# 2018-06-15T03:52:45.795+0000 I REPL [rsBackgroundSync] sync source candidate: mon02:27000
Reset the electionTimeoutMillis on the new primary
rs.isMaster().me
// mon02:27000
// rs0:PRIMARY>
// on the new primary
var conf = rs.conf()
conf.settings.electionTimeoutMillis=10000
/*
rs.reconfig(conf)
{
"ok": 1,
"operationTime": Timestamp(1529034252, 1),
"$clusterTime": {
"clusterTime": Timestamp(1529034252, 1),
"signature": {
"hash": BinData(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId": NumberLong(0)
}
}
}
*/
Summary
Although the MongoDB database application is highly available for reads from secondaries during the elections, the database is not available for writes until a Primary is elected. So it is important to ensure the primary is available sooner than later to meet your SLA for writes.
Previous Articles
Series of articles solely created for you to master MongoDB
A long awaited and most requested feature for many, has finally arrived
How to prevent someone dropping your collections?
Know the operations currently executing on MongoDB server inside out