db.collection.count()返回更多文档以供MongoDB中的分片收集 [英] db.collection.count() returns a lot more documents for sharded collection in MongoDB

查看:297
本文介绍了db.collection.count()返回更多文档以供MongoDB中的分片收集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个带有复制集的分片(每个3个实例).当我对分片集合执行count()时,我得到的比实际文档数要多得多(相差超过250万个文档).当我只执行find()并在forEach()循环中递增计数器时,情况相同.

I have 2 shards with replication sets (3 instances each). When I do count() on a sharded collection, I get a lot more than the real number of documents (more than 2.5 millions documents difference). Same when I just do find() and incrementing counter in forEach() loop.

我怎么知道真实的文件数量?首先,我知道增长的趋势,即它不能如此激进地增长.其次,当我使用以下M/R脚本对文档进行计数时,我得到的文档的真实数量(如我所假设).我使用此脚本查看重复的文档.重复数是几千而不是数百万. test_duplicate_collection减去重复项后的计数为我提供了真实的文档数量.

How do I know real number of documents? First of all, I know the trend of increase, i.e. it can not increase so radically. Secondly, when I count documents with the following M/R script, I get real number of documents (as I assume). I use this script to see duplicate documents. Number of duplicates is several thousands not millions. And the count on test_duplicate_collection minus duplicates gives me real number of documents.

var map = function(){
   emit(this.doc_id, 1);
};

var reduce = function(key, values){
   var result = 0;
   values.forEach(function(value) {
     result += value;
   });

   return result;
};

db.test_collection.mapReduce(map, reduce, "test_duplicate_collection",null );

现在,我知道在平衡期间可能会发生某些块将它们转移到另一个分片时尚未被删除的情况.但是在状态(sh.status())中,我看到所有块都是均匀分布的.我也试图暂停写操作,看是否需要一些时间,但是什么也没发生.

Now, I understand that during balancing it can happen that some chunks are not deleted yet while transferring them to another shard. But I see in the status (sh.status()) that all chunks are equally distributed. I have also tried to pause write operations to see if it takes some time, but nothing happened.

您可能会说删除移动的块仍在继续,实际上,当我刚开始使用分片时,我看到分片集合略有减少(没有写操作).但是目前,随着时间的推移没有任何变化,只是停滞不前. 我还尝试使用orphanage.js希望找到孤立的文档(使用

You might say deletion of moved chunks is still going on, and indeed when I just started to use sharding I saw slight decreases (with no write operations) for sharded collection. But currently, there is no change over time, it just stands still. I tried also to use orphanage.js with the hope to find orphaned documents (using the script from https://groups.google.com/forum/#!topic/mongodb-user/OKH5_KDO04I) but no such documents have been found.

我的问题是count()find().forEach()提供的文档数量超过真实数量(即vs M/R脚本)的原因可能是什么?

My question is what can be the reason that count() and find().forEach() give more than real number of documents (i.e. vs M/R script).

感谢您的帮助.

EDIT1

其中一个分片中的复制集的配置存在问题.具体而言,尚未在配置文件中设置主服务器.在MMS仪表板而不是Primary中,我始终看到被其他复制主机侦听的主机的Slave.修复后,forEach循环计数开始显示与上述M/R脚本中相同数量的文档.因此,当前唯一的问题是count()本身.

There was a problem with the configuration of the replication set in one of the shards. Specifically, no master has been set in the configuration file. In MMS dashboard instead of Primary I always saw Slave for host who was listened by other replication hosts. When we fixed it, forEach loop count started to show the same number of documents as in M/R script above. So the only problem currently is with the count() itself.

在MongoDB JIRA中,我在分片环境中发现了带有count()的以下未解决的错误 https://jira.mongodb.org/browse/SERVER-3645 但这实际上与平衡期间的count()有关,即count可以计算当前由平衡器移动的块.作为一种解决方法,此错误建议放置始终为真的查询.我也尝试过,但仍然返回计数.

In MongoDB JIRA I found the following unresolved bug with count() in sharded environment https://jira.mongodb.org/browse/SERVER-3645 But it really relates to count() during balancing, i.e. count may count chunks which are currently moved by the balancer. As a workaround this bug proposes to put query which is always true. I tried it also but still it returns count as before.

推荐答案

尝试使用速度较慢(但显然更准确)的.itcount()

Try using the slower (but apparently more accurate) .itcount()

这篇关于db.collection.count()返回更多文档以供MongoDB中的分片收集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆