MongoDB 分片集合未重新平衡 [英] MongoDB sharded collection not rebalancing

查看:63
本文介绍了MongoDB 分片集合未重新平衡的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个相对简单的分片 MongoDB 设置:4 个分片,每个分片是一个至少有 3 个成员的副本集.每个集合由从大量文件加载的数据组成;每个文件都有一个单调递增的 ID,分片是基于 ID 的散列完成的.

We have a relatively simple sharded MongoDB setup: 4 shards with each shard being a replica set that has at least 3 members. Each collection consists of data loaded from a lot of files; each file is given a monotonically increasing ID and the sharding is done based on a hash of the ID.

我们的大多数集合都按预期工作.但是,我有一个集合似乎没有正确地在分片之间分配块.该集合在创建索引之前加载了约 30GB 的数据并对其进行了分片,但是据我所知,这无关紧要.以下是该系列的统计数据:

Most of our collections are working as expected. However I have one collection that does not seem to be properly distributing the chunks across shards. The collection had ~30GB of data loaded before the index was created and it was sharded, however this shouldn't matter as far as I'm aware. Here are the stats for the collection:

mongos> db.mycollection.stats()
{
        "sharded" : true,
        "ns" : "prod.mycollection",
        "count" : 53304954,
        "numExtents" : 37,
        "size" : 35871987376,
        "storageSize" : 38563958544,
        "totalIndexSize" : 8955712416,
        "indexSizes" : {
                "_id_" : 1581720784,
                "customer_code_1" : 1293148864,
                "job_id_1_customer_code_1" : 1800853936,
                "job_id_hashed" : 3365576816,
                "network_code_1" : 914412016
        },
        "avgObjSize" : 672.9578525853339,
        "nindexes" : 5,
        "nchunks" : 105,
        "shards" : {
                "rs0" : {
                        "ns" : "prod.mycollection",
                        "count" : 53304954,
                        "size" : 35871987376,
                        "avgObjSize" : 672.9578525853339,
                        "storageSize" : 38563958544,
                        "numExtents" : 37,
                        "nindexes" : 5,
                        "lastExtentSize" : 2146426864,
                        "paddingFactor" : 1.0000000000050822,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 8955712416,
                        "indexSizes" : {
                                "_id_" : 1581720784,
                                "job_id_1_customer_code_1" : 1800853936,
                                "customer_code_1" : 1293148864,
                                "network_code_1" : 914412016,
                                "job_id_hashed" : 3365576816
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}

还有这个集合的 sh.status():

And the sh.status() for this collection:

            prod.mycollection
                    shard key: { "job_id" : "hashed" }
                    chunks:
                            rs0     105
                    too many chunks to print, use verbose if you want to force print

关于为什么这个集合只分发给 rs0,我有什么遗漏吗?有没有办法强制重新平衡?我执行了相同的步骤来分片其他集合,并且它们自己正确分布.以下是成功分片的集合的统计信息:

Is there something I'm missing as to why this collection will only distribute to rs0? Is there a way to force a rebalance? I performed the same steps to shard other collections and they properly distributed themselves. Here are the stats for a collection that successfully sharded:

mongos> db.myshardedcollection.stats()
{
        "sharded" : true,
        "ns" : "prod.myshardedcollection",
        "count" : 5112395,
        "numExtents" : 71,
        "size" : 4004895600,
        "storageSize" : 8009994240,
        "totalIndexSize" : 881577200,
        "indexSizes" : {
                "_id_" : 250700688,
                "customer_code_1" : 126278320,
                "job_id_1_customer_code_1" : 257445888,
                "job_id_hashed" : 247152304
        },
        "avgObjSize" : 783.3697513591966,
        "nindexes" : 4,
        "nchunks" : 102,
        "shards" : {
                "rs0" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1284540,
                        "size" : 969459424,
                        "avgObjSize" : 754.7133012595949,
                        "storageSize" : 4707762176,
                        "numExtents" : 21,
                        "nindexes" : 4,
                        "lastExtentSize" : 1229475840,
                        "paddingFactor" : 1.0000000000000746,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 190549856,
                        "indexSizes" : {
                                "_id_" : 37928464,
                                "job_id_1_customer_code_1" : 39825296,
                                "customer_code_1" : 33734176,
                                "job_id_hashed" : 79061920
                        },
                        "ok" : 1
                },
                "rs1" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1287243,
                        "size" : 1035438960,
                        "avgObjSize" : 804.384999568846,
                        "storageSize" : 1178923008,
                        "numExtents" : 17,
                        "nindexes" : 4,
                        "lastExtentSize" : 313208832,
                        "paddingFactor" : 1,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 222681536,
                        "indexSizes" : {
                                "_id_" : 67787216,
                                "job_id_1_customer_code_1" : 67345712,
                                "customer_code_1" : 30169440,
                                "job_id_hashed" : 57379168
                        },
                        "ok" : 1
                },
                "rs2" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1131411,
                        "size" : 912549232,
                        "avgObjSize" : 806.5585644827565,
                        "storageSize" : 944386048,
                        "numExtents" : 16,
                        "nindexes" : 4,
                        "lastExtentSize" : 253087744,
                        "paddingFactor" : 1,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 213009328,
                        "indexSizes" : {
                                "_id_" : 64999200,
                                "job_id_1_customer_code_1" : 67836272,
                                "customer_code_1" : 26522944,
                                "job_id_hashed" : 53650912
                        },
                        "ok" : 1
                },
                "rs3" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1409201,
                        "size" : 1087447984,
                        "avgObjSize" : 771.6769885914075,
                        "storageSize" : 1178923008,
                        "numExtents" : 17,
                        "nindexes" : 4,
                        "lastExtentSize" : 313208832,
                        "paddingFactor" : 1,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 255336480,
                        "indexSizes" : {
                                "_id_" : 79985808,
                                "job_id_1_customer_code_1" : 82438608,
                                "customer_code_1" : 35851760,
                                "job_id_hashed" : 57060304
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}

sh.status() 用于正确分片的集合:

sh.status() for the properly sharded collection:

            prod.myshardedcollection
                    shard key: { "job_id" : "hashed" }
                    chunks:
                            rs2     25
                            rs1     26
                            rs3     25
                            rs0     26
                    too many chunks to print, use verbose if you want to force print

推荐答案

在 MongoDB 中,当您使用分片系统并且看不到任何平衡时,它可能是几件事之一.

In MongoDB when you go to a sharded system and you don't see any balancing it could one of several things.

  1. 您可能没有足够的数据来触发平衡.这绝对不是你的情况,但有些人可能没有意识到默认的块大小为 64MB,在有足够的数据将其中的一些拆分和平衡到其他块之前可能需要一段时间插入数据.

  1. You may not have enough data to trigger balancing. That was definitely not your situation but some people may not realize that with default chunk size of 64MB it might take a while of inserting data before there is enough to split and balance some of it to other chunks.

平衡器可能没有运行 - 因为您的其他集合正在获得平衡,这在您的情况下不太可能,除非在平衡器出于某种原因停止后最后对该集合进行分片.

The balancer may not have been running - since your other collections were getting balanced that was unlikely in your case unless this collection was sharded last after the balancer was stopped for some reason.

无法移动集合中的块.当分片键的粒度不足以将数据拆分为足够小的块时,就会发生这种情况.事实证明这是您的情况,因为您的分片键对于这么大的集合来说不够细粒度 - 您有 105 个块(这可能对应于唯一的 job_id 值的数量)和超过 30GB 的数据.当块太大并且平衡器无法移动它们时,它会将它们标记为巨型"(因此它不会旋转试图迁移它们的轮子).

The chunks in your collection can't be moved. This can happen when the shard key is not granular enough to split the data into small enough chunks. As it turns out this was your case because your shard key turned out not to be granular enough for this large a collection - you have 105 chunks (which probably corresponds to the number of unique job_id values) and over 30GB of data. When the chunks are too large and the balancer can't move them it tags them as "jumbo" (so it won't spin its wheels trying to migrate them).

如何从糟糕的分片键选择中恢复?通常,更改分片键是非常痛苦的——因为分片键是不可变的,您必须执行等效于完整数据迁移的操作才能将其放入具有另一个分片键的集合中.但是,在您的情况下,集合仍然在一个分片上,因此取消分片"集合并使用新的分片键重新分片应该相对容易.因为 job_id 的数量相对较少,我建议使用常规索引对 job_id,customer_code 进行分片,因为您可能会查询它,我猜它总是在创建文档时设置.

How to recover from a poor choice of a shard key? Normally it's very painful to change the shard key - since shard key is immutable you have to do an equivalent of a full data migration to get it into a collection with another shard key. However, in your case the collection is all on one shard still, so it should be relatively easy to "unshard" the collection and reshard it with a new shard key. Because the number of job_ids is relatively small I would recommend using a regular index to shard on job_id,customer_code since you probably query on that and I'm guessing it's always set at document creation time.

这篇关于MongoDB 分片集合未重新平衡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆