具有延迟副本的MongoDB平衡器超时 [英] MongoDB balancer timeout with delayed replica

查看:76
本文介绍了具有延迟副本的MongoDB平衡器超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有两个mongodb分片的设置.每个分片包含一个主机,一个从机,一个24小时从机延迟从机和一个仲裁器. 但是,平衡器无法迁移任何等待延迟的从属迁移的分片. 我尝试在平衡器配置中将_secondaryThrottle设置为false,但是仍然存在问题.

迁移似乎持续了一天,然后失败了(大量等待日志中的从属消息).最终,它放弃了并开始了新的迁移.该消息说正在等待3个从站,但是延迟从站是隐藏的,并且prio 0,所以它应该等待那个从站.而且,如果_secondaryThrottle工作正常,它应该不等待任何奴隶吗?

几个月以来一直是这样,所以配置应该已经在所有mongoses上重新加载.一些运行平衡器的蒙古人最近已重新启动.

有人知道如何解决问题吗,在启动延迟的奴隶之前我们没有这些问题,但这只是我们的理论.

配置:

{ "_id" : "balancer", "_secondaryThrottle" : false, "stopped" : false }

从shard1主进程登录:

[migrateThread]警告:迁移提交等待3个从属 'xxx.xxx'{分片键:ObjectId('4fd2025ae087c37d32039a9e')}-> {shardkey:ObjectId('4fd2035ae087c37f04014a79')}等待: 529dc9d9:7a [migrateThread]等待复制赶上之前 输入关键部分

从shard2主进程登录:

星期二12月3日14:52:25.302 [conn1369472] moveChunk数据传输 进度:{有效:true,ns:"xxx.xxx",来自: "shard2/mongo2:27018,mongob2:27018",最小值:{shardkey: ObjectId('4fd2025ae087c37d32039a9e')},最大值:{分键: ObjectId('4fd2035ae087c37f04014a79')},shardKeyPattern:{shardkey: 1.0},状态:追赶",计数:{克隆:22773,cloneedBytes:36323458,追赶:0,稳定:0},确定:1.0}我使用的内存:0

更新: 我确认删除slaveDelay可使平衡器重新工作.他们一起来加快速度,就搬走了大块的东西.因此,问题似乎与slaveDelay有关.我还确认平衡器以"secondaryThrottle":false运行.无论如何,它似乎确实在等待奴隶.

碎片2:

12月10日星期二11:44:25.423 [migrateThread]警告:迁移提交,等待3个从站获取'xxx.xxx'{shardkey:ObjectId('4ff1213ee087c3516b2f703f')}-> {shardkey:ObjectId('4ff12a5eddf2b32dff1e7bea')}等待中:52a6f089:81

星期二12月10日11:44:26.423 [migrateThread]在进入关键部分之前等待复制赶上

Tue Dec 10 10:44:27.423 [migrateThread]在进入关键部分之前等待复制赶上

Tue Dec 10 11:44:28.423 [migrateThread]在进入关键部分之前等待复制赶上

星期二12月10日11:44:29.424 [migrateThread]在进入关键部分之前等待复制赶上

星期二12月10日11:44:30.424 [migrateThread]在进入关键部分之前等待复制赶上

Tue Dec 10 11:44:31.424 [migrateThread]等待复制赶上,然后进入关键部分

星期二12月10日11:44:31.424 [migrateThread]迁移提交成功刷新到xxx.xxx'{shardkey:ObjectId('4ff1213ee087c3516b2f703f')}-> {shardkey:ObjectId('4ff12a5eddff2b32dff1e7bea'}} >

星期二12月10日11:44:31.425 [migrateThread]迁移提交刷新到日记以获取"xxx.xxx"的日志{shardkey:ObjectId('4ff1213ee087c3516b2f703f')}-> {shardkey:ObjectId('4ff12a5eddf2b32dff1e7bea')}

星期二12月10日11:44:31.647 [migrateThread]迁移提交成功刷新到xxx.xxx'{shardkey:ObjectId('4ff1213ee087c3516b2f703f')}-> {shardkey:ObjectId('4ff12a5eddff2b32dff1e7bea'}} >

星期二12月10日11:44:31.667 [migrateThread]迁移提交已刷新到日记以获取"xxx.xxx"的日志{shardkey:ObjectId('4ff1213ee087c3516b2f703f')}-> {shardkey:ObjectId('4ff12a5eddf2b32dff1e7bea')}

解决方案

平衡器在开始删除源分片上的那些文档之前,正正确地等待目标分片的副本集的大"性使文档被迁移.

问题是您的副本集中有四个成员(主服务器,从服务器,24小时从延迟延迟从服务器和仲裁器).这意味着三分之二是多数.我不确定为什么要添加仲裁器,但是如果删除仲裁器,则占多数,平衡器将不必等待延迟的从属器.

获得相同结果的另一种方法是设置具有votes:0属性的延迟从属服务器,并将仲裁器保留为第三个投票节点.

We have a setup of two mongodb shards. Each shard contains a master, a slave, a 24h slave delay slave and an arbiter. However the balancer fails to migrate any shards waiting for the delayed slave to migrate. I have tried setting _secondaryThrottle to false in the balancer config, but I still have the issue.

It seems the migration goes on for a day and then fails (A ton of waiting for slave messages in the logs). Eventually it gives up and starts a new migration. The message says waiting for 3 slaves, but the delay slave is hidden and prio 0 so it should wait for that one. And if the _secondaryThrottle worked it should not wait for any slave right?

It's been like this for a few months now so the config should have been reloaded on all mongoses. Some of the mongoses running the balancer have been restarter recently.

Does anyone have any idea how to solve the problem, we did not have these issues before starting the delayed slave, but it's just our theory.

Config:

{ "_id" : "balancer", "_secondaryThrottle" : false, "stopped" : false }

Log from shard1 master process:

[migrateThread] warning: migrate commit waiting for 3 slaves for 'xxx.xxx' { shardkey: ObjectId('4fd2025ae087c37d32039a9e') } -> {shardkey: ObjectId('4fd2035ae087c37f04014a79') } waiting for: 529dc9d9:7a [migrateThread] Waiting for replication to catch up before entering critical section

Log from shard2 master process:

Tue Dec 3 14:52:25.302 [conn1369472] moveChunk data transfer progress: { active: true, ns: "xxx.xxx", from: "shard2/mongo2:27018,mongob2:27018", min: { shardkey: ObjectId('4fd2025ae087c37d32039a9e') }, max: { shardkey: ObjectId('4fd2035ae087c37f04014a79') }, shardKeyPattern: { shardkey: 1.0 }, state: "catchup", counts: { cloned: 22773, clonedBytes: 36323458, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0

Update: I confirmed that removing slaveDelay got the balancer working again. As soon as they got up to speed chunks moved. So the problem seems to be related to the slaveDelay. I also confirmed that the balancer runs with "secondaryThrottle" : false. It does seem to wait for slaves anyway.

Shard2:

Tue Dec 10 11:44:25.423 [migrateThread] warning: migrate commit waiting for 3 slaves for 'xxx.xxx' { shardkey: ObjectId('4ff1213ee087c3516b2f703f') } -> { shardkey: ObjectId('4ff12a5eddf2b32dff1e7bea') } waiting for: 52a6f089:81

Tue Dec 10 11:44:26.423 [migrateThread] Waiting for replication to catch up before entering critical section

Tue Dec 10 11:44:27.423 [migrateThread] Waiting for replication to catch up before entering critical section

Tue Dec 10 11:44:28.423 [migrateThread] Waiting for replication to catch up before entering critical section

Tue Dec 10 11:44:29.424 [migrateThread] Waiting for replication to catch up before entering critical section

Tue Dec 10 11:44:30.424 [migrateThread] Waiting for replication to catch up before entering critical section

Tue Dec 10 11:44:31.424 [migrateThread] Waiting for replication to catch up before entering critical section

Tue Dec 10 11:44:31.424 [migrateThread] migrate commit succeeded flushing to secondaries for 'xxx.xxx' { shardkey: ObjectId('4ff1213ee087c3516b2f703f') } -> { shardkey: ObjectId('4ff12a5eddf2b32dff1e7bea') }

Tue Dec 10 11:44:31.425 [migrateThread] migrate commit flushed to journal for 'xxx.xxx' { shardkey: ObjectId('4ff1213ee087c3516b2f703f') } -> { shardkey: ObjectId('4ff12a5eddf2b32dff1e7bea') }

Tue Dec 10 11:44:31.647 [migrateThread] migrate commit succeeded flushing to secondaries for 'xxx.xxx' { shardkey: ObjectId('4ff1213ee087c3516b2f703f') } -> { shardkey: ObjectId('4ff12a5eddf2b32dff1e7bea') }

Tue Dec 10 11:44:31.667 [migrateThread] migrate commit flushed to journal for 'xxx.xxx' { shardkey: ObjectId('4ff1213ee087c3516b2f703f') } -> { shardkey: ObjectId('4ff12a5eddf2b32dff1e7bea') }

解决方案

The balancer is properly waiting for the MAJORITY of the replica set of the destination shard to have the documents being migrated before initiating the delete of those documents on the source shard.

The issue is that you have FOUR members in your replica set (master, a slave, a 24h slave delay slave and an arbiter). That means three is the majority. I'm not sure why you added an arbiter, but if you remove it, then TWO will be the majority and the balancer will not have to wait for the delayed slave.

The alternate way of achieving the same result is to set up the delayed slave with votes:0 property and leave the arbiter as the third voting node.

这篇关于具有延迟副本的MongoDB平衡器超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆