MongoDB MapReduce,reduce函数的第二个参数是多维数组 [英] MongoDB MapReduce, second argument of reduce function is multidimensional array

查看:323
本文介绍了MongoDB MapReduce,reduce函数的第二个参数是多维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将mapReduce用于我的收藏.仅出于调试目的,我返回了作为第二个参数传递的vals值做reduce函数,如下所示:

I tried to use mapReduce for my collection. Just for debug I returned vals value passed as second argument do reduce function, like this:

db.runCommand({
 "mapreduce":"MyCollection",
 "map":function() {
    emit( {
       country_code:this.cc,
       partner:this.di,
       registeredPeriod:Math.floor((this.ca - 1399240800)/604800)
    },
    {
       count:Math.ceil((this.lla - this.ca)/86400)
    });
 },
 "reduce":function(k, vals) {
    return {
       'count':vals
    }; 
 },
 "query":{
    "ca":{
       "$gte":1399240800
    },
    "di":405,
    "cc":"1"
 },
 "out":{
    "inline":true
 }
});

我得到了这样的结果:

{
"results" : [
    {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 0
        },
        "value" : {
            "count" : [
                {
                    "count" : 37
                },
                {
                    "count" : 38
                }
            ]
        }
    },
    {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 1
        },
        "value" : {
            "count" : 36
        }
    },
    {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 4
        },
        "value" : {
            "count" : [
                {
                    "count" : [
                        {
                            "count" : 16
                        },
                        {
                            "count" : 16
                        }
                    ]
                },
                {
                    "count" : 15
                }
            ]
        }
    }
],
"timeMillis" : 38,
"counts" : {
    "input" : 130,
    "emit" : 130,
    "reduce" : 5,
    "output" : 6
},
"ok" : 1
}

我真的不知道为什么我将多维数组作为reduce函数的第二个参数.我的意思是关于结果的这一部分:

I really don't know why I got multidimensional array as second argument for my reduce function. I mean about this part of result:

        {
        "_id" : {
            "country_code" : "1",
            "distribution" : 405,
            "installationPeriod" : 4
        },
        "value" : {
            "count" : [
                {
                    "count" : [ // <= Why is this multidimensional?
                        {
                            "count" : 16
                        }

为什么这是多维的?为什么嵌入式数组的键像从reduce函数返回的一样?

Why is this multidimensional? And why key of embedded array is same like returned from reduce function?

推荐答案

原因是因为这是mapReduce的工作方式.从文档点:

The reason is because this is mapReduce works. From the documentation point:

对于相同的键,MongoDB可以多次调用reduce函数.在这种情况下,该键的化简函数的先前输出将成为该键的下一个化简函数调用的输入值之一.

MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.

还有一点:

返回对象的类型必须与map函数发出的值的类型相同,以确保以下操作为真:

the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:

因此,即使您没有像文档所指出的那样更改签名",您仍然只在一次减少传递中一次处理n个项目,然后在下一遍处理另一个n个项目.最终的处理过程是将一个片段中返回的数组与另一个片段中的数组组合在一起.

So even though you have not "changed the signature" as that documentation points to, you are still only processing n items at once in one reduce pass and then another n items in the next pass. What happens in the eventual processing of this is that the array that was returned in one fragment is combined with the array from another fragment.

所以发生的事情是您的reduce返回一个数组,但这不是您为键发出的所有项目的全部",而是其中的一些.然后,在同一键"上进行的另一个归约处理更多项.最后,将这两个数组(或可能更多)再次发送给reduce,以尝试按预期实际减少"这些项目.

So what happened is your reduce returns an array, but it is not "all" of the items you emitted for the key, just some of them. Then another reduce on the same "key" processes more items. Finally those two arrays (or probably more) are again sent to the reduce, in an attempt to actually "reduce" those items as is intended.

这是一个通用概念,所以当您只推回阵列时,这就是您所得到的.

That is the general concept, so it is no surprise that when you are just pushing back the array then that is what you get.

简短版本,mapReduce处理块中的输出键",而不是一次全部处理.最好现在就了解它,以免以后成为您的问题.

Short version, mapReduce processes the ouput "keys" in chunks and not all at once. Better to learn that now before it becomes a problem for you later.

这篇关于MongoDB MapReduce,reduce函数的第二个参数是多维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆