次级的无休止的恢复状态 [英] Endless recovering state of secondary

查看:91
本文介绍了次级的无休止的恢复状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MongoDB 3.0.2上构建了一个具有一个主副本,一个辅助副本和一个仲裁器的复制集.主服务器和仲裁器位于同一主机上,辅助服务器和仲裁器位于另一台主机上.

I build a replication set with one primary, one secondary and one arbiter on MongoDB 3.0.2. The primary and arbiter are on the same host and the secondary is on another host.

随着写入过载的增加,辅助节点无法跟从主要节点并进入恢复状态.主服务器可以连接到辅助服务器,就像我可以通过主服务器主机上的Mongo Shell登录到辅助服务器一样.

With the growing of write overload, the secondary can't follow the primary and step into the state of recovering. The primary can connect to the secondary as I can log to the secondary server by Mongo shell on the host of primary.

我停止所有操作,并使用命令rs.status()观察辅助服务器的状态,然后在辅助服务器上键入命令rs.syncFrom("primary's ip:port").

I stop all the operations and watch the secondary's state with the command rs.status() and type the command rs.syncFrom("primary's ip:port") on secondary.

然后,rs.status()命令的结果显示,辅助节点的optimeDate远远落后于辅助节点的optimeDate,并且一条消息间歇出现,如下所示:

Then the result of the rs.status() command shows that the optimeDate of secondary is far behind that of the primary and one message appears intermittently as below:

"set" : "shard01", "date" : ISODate("2015-05-15T02:10:55.382Z"), "myState" : 3, "members" : [ { "_id" : 0, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 135364, "optime" : Timestamp(1431655856, 6), "optimeDate" : ISODate("2015-05-15T02:10:56Z"), "lastHeartbeat" : ISODate("2015-05-15T02:10:54.306Z"), "lastHeartbeatRecv" : ISODate("2015-05-15T02:10:53.634Z"), "pingMs" : 0, "electionTime" : Timestamp(1431520398, 2), "electionDate" : ISODate("2015-05-13T12:33:18Z"), "configVersion" : 3 }, { "_id" : 1, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 135364, "lastHeartbeat" : ISODate("2015-05-15T02:10:53.919Z"), "lastHeartbeatRecv" : ISODate("2015-05-15T02:10:54.076Z"), "pingMs" : 0, "configVersion" : 3 }, { "_id" : 2, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 3, "stateStr" : "RECOVERING", "uptime" : 135510, "optime" : Timestamp(1431602631, 134), "optimeDate" : ISODate("2015-05-14T11:23:51Z"), "infoMessage" : "could not find member to sync from", "configVersion" : 3, "self" : true } ], "ok" : 1

"set" : "shard01", "date" : ISODate("2015-05-15T02:10:55.382Z"), "myState" : 3, "members" : [ { "_id" : 0, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 135364, "optime" : Timestamp(1431655856, 6), "optimeDate" : ISODate("2015-05-15T02:10:56Z"), "lastHeartbeat" : ISODate("2015-05-15T02:10:54.306Z"), "lastHeartbeatRecv" : ISODate("2015-05-15T02:10:53.634Z"), "pingMs" : 0, "electionTime" : Timestamp(1431520398, 2), "electionDate" : ISODate("2015-05-13T12:33:18Z"), "configVersion" : 3 }, { "_id" : 1, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 135364, "lastHeartbeat" : ISODate("2015-05-15T02:10:53.919Z"), "lastHeartbeatRecv" : ISODate("2015-05-15T02:10:54.076Z"), "pingMs" : 0, "configVersion" : 3 }, { "_id" : 2, "name" : "xxx.xxx.xxx.xxx:xxx", "health" : 1, "state" : 3, "stateStr" : "RECOVERING", "uptime" : 135510, "optime" : Timestamp(1431602631, 134), "optimeDate" : ISODate("2015-05-14T11:23:51Z"), "infoMessage" : "could not find member to sync from", "configVersion" : 3, "self" : true } ], "ok" : 1

"infoMessage":找不到要从其同步的成员"

"infoMessage" : "could not find member to sync from"

主仲裁器和仲裁器都可以.我想知道此消息的原因以及如何将辅助状态从正在恢复"更改为辅助".

The primary and arbiter are both OK. I want to know the reason of this message and how to change the secondary's state from "recovering" to "secondary".

推荐答案

问题(最有可能出现)

主要对象的最后一个操作来自"2015-05-15T02:10:56Z",而辅助对象的最后一个操作来自"2015-05-14T11:23:51Z",这是一个相差约15小时.该窗口可能会远远超过您的复制操作日志窗口(操作日志中第一个操作和最后一个操作条目之间的时间差).简而言之,辅助服务器上的操作太多,无法跟上辅助服务器.

The problem (most likely)

The last operation on the primary is from "2015-05-15T02:10:56Z", whereas the last operation of the going to be secondary is from "2015-05-14T11:23:51Z", which is a difference of roughly 15 hours. That window may well exceed your replication oplog window (the difference between the time of the first and the last operation entry in your oplog). Put simply, there are too many operations on the primary for the secondary to catch up.

稍微详细一点(虽然简化):在初始同步期间,辅助同步从中获取的数据是给定时间点的数据.同步该时间点的数据后,辅助节点将连接到操作日志,并根据操作日志条目应用在该时间点到现在之间所做的更改.只要oplog在上述时间点之间保留所有操作,此方法就可以很好地工作.但是oplog的大小有限(它是所谓的 上限集合 ).因此,如果在主同步上发生的操作多于初始同步期间操作日志所能容纳的操作,则最早的操作会淡出".辅助节点认识到并非所有必需的操作都可以用来构造"与主节点相同的数据,并拒绝完成同步,而是停留在RECOVERY模式下.

A bit more elaborated (though simplified): during an initial sync, the data the secondary syncs from is the data of a given point in time. When the data of that point in time is synced over, the secondary connects to the oplog and applies the changes that were made between said point in time and now according to the oplog entries. This works well as long as the oplog holds all operations between the mentioned point in time. But the oplog has a limited size (it is a so called capped collection). So if there are more operations happening on the primary than the oplog can hold during the initial sync, the oldest operations "fade out". The secondary recognises that not all operations are available necessary to "construct" the same data as the primary and refuses to complete the sync, staying in RECOVERY mode.

问题是一个已知的问题,而不是错误,而是MongoDB的内部工作原理以及开发团队做出的若干故障保护假设的结果.因此,有几种方法可以处理这种情况.可悲的是,由于您只有两个数据承载节点,因此都涉及停机时间.

The problem is a known one and not a bug, but a result of the inner workings of MongoDB and several fail-safe assumptions made by the development team. Hence, there are several ways to deal with the situation. Sadly, since you only have two data bearing nodes, all involve downtime.

这是我的首选方法,因为它可以一次(全部)解决问题.但是,它比其他解决方案要复杂一些.从较高的角度来看,这些是您要采取的步骤.

This is my preferred method, since it deals with the problem once and (kind of) for all. It's a bit more complicated than other solutions, though. From a high level perspective, these are the steps you take.

  1. 关闭主要
  2. 使用直接访问数据文件创建操作日志的备份
  3. 以独立模式重新启动mongod
  4. 将当前操作日志复制到一个临时集合
  5. 删除当前操作日志
  6. 以所需大小重新创建操作日志
  7. 将操作日志条目从临时集合复制回闪亮的新操作日志
  8. 重新启动mongod作为副本集的一部分
  1. Shut down the primary
  2. Create a backup of the oplog using direct access to the data files
  3. Restart the mongod in standalone mode
  4. Copy the current oplog to a temporary collection
  5. Delete the current oplog
  6. Recreate the oplog with the desired size
  7. Copy back the oplog entries from the temporary collection to the shiny new oplog
  8. Restart mongod as part of the replica set

不要忘记在进行初始同步之前增加辅助节点的操作日志,因为它可能在将来的某个时候成为主要节点!

有关详细信息,请阅读 更改oplog的大小有关副本集维护的教程中的 .

For details, please read "Change the size of the oplog" in the tutorials regarding replica set maintenance.

如果选项1不可行,则唯一可行的解​​决方案是关闭导致副本集负载的应用程序,重新启动同步并等待同步完成.根据要传输的数据量,用几个小时计算.

If option 1 is not viable, the only real other solution is to shut down the application causing load on the replica set, restart the sync and wait for it too complete. Depending on the amount of the data to be transferred, calculate with several hours.

oplog窗口问题是一个众所周知的问题.虽然可以使用MongoDB轻松设置副本集和分片集群,但需要相当多的知识和丰富的经验才能正确地维护它们.在不了解基础知识的情况下,不要运行具有复杂设置的数据库那么重要的事情-万一发生不良事件(tm),很可能会导致FUBAR问题.

The oplog window problem is a well known one. While replica sets and sharded clusters are easy to set up with MongoDB, quite some knowledge and a bit of experience is needed to maintain them properly. Do not run something as important as a database with a complex setup without knowing the basics - in case Something Bad (tm) happens, it might well lead to a situation FUBAR.

这篇关于次级的无休止的恢复状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆