在mongodb map-reduce中使用相同的键多次调用reduce. [英] Reduce is called several times with the same key in mongodb map-reduce

查看:135
本文介绍了在mongodb map-reduce中使用相同的键多次调用reduce.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在mongo shell中的mongodb上运行map reduce.由于某种原因,在减少阶段,我得到了多次调用同一键的信息(而不是单个键),因此得到了错误的结果. 我不是这方面的专家,所以也许我正在犯一些愚蠢的错误.任何帮助表示赞赏.

I'm trying to run map reduce on mongodb in mongo shell. For some reason, in the reduce phase, I get several calls for the same key (instead of single one), so I get wrong results. I'm not an expert in this domains, so maybe I'm doing some stupid mistake. Any help appreciated.

谢谢.

这是我的小例子:

我正在创建10000个文档:

I'm creating 10000 documents:

var i = 0;
db.docs.drop();
while (i < 10000) {
    db.docs.insert({text:"line " + i,index:i});
    i++;
}

然后我正在基于模块10进行map-reduce(所以我除了要在每个存储桶"中获得1000)

Then I'm doing map-reduce based on module 10 (so I except to get 1000 in each "bucket")

db.docs.mapReduce(
    function() { 
       emit(this.index%10,1);
    },
    function(key,values) {
       return values.length;
    },
    {
    out : {inline : 1}
    }
);

但是,结果显示如下:

{
    "results" : [
        {
            "_id" : 0,
            "value" : 21
        },
        {
            "_id" : 1,
            "value" : 21
        },
        {
            "_id" : 2,
            "value" : 21
        },
        {
            "_id" : 3,
            "value" : 21
        },
        {
            "_id" : 4,
            "value" : 21
        },
        {
            "_id" : 5,
            "value" : 21
        },
        {
            "_id" : 6,
            "value" : 21
        },
        {
            "_id" : 7,
            "value" : 21
        },
        {
            "_id" : 8,
            "value" : 21
        },
        {
            "_id" : 9,
            "value" : 21
        }
    ],
    "timeMillis" : 76,
    "counts" : {
        "input" : 10000,
        "emit" : 10000,
        "reduce" : 500,
        "output" : 10
    },
    "ok" : 1,
}

推荐答案

Map/Reduce本质上是一种递归操作.特别是, reduce函数记录的要求包括以下内容声明:

Map/Reduce is essentially a recursive operation. In particular, the documented requirements for the reduce function include the following statement:

MongoDB可以为同一键多次调用reduce函数.在这种情况下,该键的reduce函数的先前输出将成为该键的下一个reduce函数调用的输入值之一.

MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.

因此,您必须期望输入只是先前调用所计算的数字.以下代码通过实际添加值来做到这一点:

Therefore, you have to expect that the input is merely the number that was counted by a previous invocation. The following code does that by actually adding the values:

db.docs.mapReduce(
    function() { emit(this.index % 10, 1); }, 
    function(key,values) { return Array.sum(values); }, 
    { out : {inline : 1} } );

现在,emit(key, 1)在某种程度上更具意义,因为1不再只是用于填充数组的任何数字,而是考虑了其值.

Now, the emit(key, 1) makes more sense in a way, because 1 is no longer just any number used to fill the array, but its value is considered.

请注意,这是多么危险:对于较小的数据集,可能是偶然给出了正确的结果,因为引擎认为不需要并行化.

As a sidenote, note how dangerous this is: For a smaller dataset, the correct result might have been given by accident, because the engine decided a parallelization wouldn't be necessary.

这篇关于在mongodb map-reduce中使用相同的键多次调用reduce.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆