MongoDB中的MapReduce函数-按ID对文档进行分组 [英] MapReduce function in MongoDB - Grouping document by ID

查看:180
本文介绍了MongoDB中的MapReduce函数-按ID对文档进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在MongoDB中学习MapReduce函数.我不想使用聚合,而是希望通过自己使用MapReduce函数定义的键对集合中的文档进行分组.

I'm trying to learn MapReduce function in MongoDB. Instead of using an aggregation, I want to group documents in collection by key defined by myself using MapReduce function.

我的收藏很酷:

/* 1 */{ "_id":ObjectId("55d5e7287e41390ea7e83a55"), "id":"a", "cool":"a1"}

/* 1 */ { "_id" : ObjectId("55d5e7287e41390ea7e83a55"), "id" : "a", "cool" : "a1" }

/* 2 */{ "_id":ObjectId("55d5e7287e41390ea7e83a56"), "id":"a", "cool":"a2"}

/* 2 */ { "_id" : ObjectId("55d5e7287e41390ea7e83a56"), "id" : "a", "cool" : "a2" }

/* 3 */{ "_id":ObjectId("55d5e7287e41390ea7e83a57"), "id":"b", "cool":"b1"}

/* 3 */ { "_id" : ObjectId("55d5e7287e41390ea7e83a57"), "id" : "b", "cool" : "b1" }

/* 4 */{ "_id":ObjectId("55d5e7287e41390ea7e83a58"), "id":"b", "cool":"b2"}

/* 4 */ { "_id" : ObjectId("55d5e7287e41390ea7e83a58"), "id" : "b", "cool" : "b2" }

/* 5 */{ "_id":ObjectId("55d5e7287e41390ea7e83a59"), "id":"c", "cool":"c1"}

/* 5 */ { "_id" : ObjectId("55d5e7287e41390ea7e83a59"), "id" : "c", "cool" : "c1" }

/* 6 */{ "_id":ObjectId("55d5e7287e41390ea7e83a5a"), "id":"d", "cool":"d1"}

/* 6 */ { "_id" : ObjectId("55d5e7287e41390ea7e83a5a"), "id" : "d", "cool" : "d1" }

这是我的MapReduce函数:

Here is my MapReduce function:

db.Cool.mapReduce(
    function(){emit(this.id, this.cool)},
    function(key, values){
        var res = [];
        values.forEach(function(v){
            res.push(v);
            });
        return {cools: res};
        },
    {out: "MapReduce"}     
)

我想要这样的结果:

/* 1 */{ "_id":"a", 价值" : { 酷":[ "a1", a2" ] }

/* 1 */ { "_id" : "a", "value" : { "cools" : [ "a1", "a2" ] } }

但是在返回的集合中,有:

But in the returning collection, there are:

/* 1 */{ "_id":"a", 价值" : { 酷":[ "a1", a2" ] }

/* 1 */ { "_id" : "a", "value" : { "cools" : [ "a1", "a2" ] } }

/* 2 */{ "_id":"b", 价值" : { 酷":[ "b1", "b2" ] }

/* 2 */ { "_id" : "b", "value" : { "cools" : [ "b1", "b2" ] } }

/* 3 */{ "_id":"c", "value":"c1"}

/* 3 */ { "_id" : "c", "value" : "c1" }

/* 4 */{ "_id":"d", "value":"d1"}

/* 4 */ { "_id" : "d", "value" : "d1" }

问题是:为什么文档"id":"a"("id":"a"有多个文档)和"id":"c"文档(只有一个)之间有区别?一个文档"id":"c")

The question is: why there a different between document "id":"a" (there are more than one document of "id":"a") and document of "id":"c" (there is only one document of "id":"c")

谢谢您的建议,对我英语不好表示抱歉.

Thanks for any suggestion and sorry for my bad English.

推荐答案

在学习中,您可能会错过 mapReduce 上的核心手册页一个>.有一个重要部分您丢失或尚未阅读和学习的信息:

In your learning you might have missed the core manual page on mapReduce. There is one vital piece of information that you either missed or have not read and learned:

MongoDB可以为同一键多次调用reduce函数.在这种情况下,该键的化简函数的先前输出将成为该键的下一个化简函数调用的输入值之一.

然后再说一遍:

返回对象的类型必须与map函数发出的值的类型相同.

所以,这基本上意味着,由于归约器"实际上并没有一次全部处理所有"唯一键,因此它期望与提供输出"相同的输入",因为该输出可以是再次反馈到减速器中.

So what that basically means is that because the "reducer" does not actually process "all" of the unique keys all at once, then it expects the same "input" as it gives "output", since that output can be fed back into the reducer again.

出于相同的原因,映射器"需要准确输出期望的减速器"输出,它也是减速器的输入".因此,您实际上根本不需要更改"数据结构,而只需减少"数据结构.

For the same reason the "mapper" needs to output exactly what is expected as the "reducer" output, which is also the reducer "input". So you don't actually "change" the data structure at all, but just "reduce" it instead.

db.Cool.mapReduce(
    function(){emit(this.id, { "cools": [this.cool] })},
    function(key, values){
        var res = [];
        values.forEach(function(cool){
            cool.cools.forEach(function(v) {
                res.push(v);
            });
        });
        return {cools: res};
    },
    {out: "MapReduce"}     
)

现在,您将输入作为数组(也是输出)处理,然后返回预期结果.

Now you are handling the input as an array which is also the output, then the expected results are returned.

接下来要学习的是,在大多数情况下,mapReduce并不是您真正想要使用的,而您应该使用

The next thing to learn is that in most cases mapReduce is not really what you want to use, and that you should be using the aggregation framework instead.

与mapReduce相对,它使用本地编码"运算符,并且不需要JavaScript解释即可运行.这在很大程度上意味着它更快",并且通常在构造上要简单得多.

As opposed to mapReduce, this uses "natively coded" operators and does not need JavaScript interpretation to run. And that largely means it is "faster" and often a lot more simple in construction.

以下与.aggregate()相同:

db.Cool.aggregate([
    { "$group": {
        "_id": "$id",
        "cools": { "$push": "$cool" }
    }}
])

同一件事,更少的编码,更快.

Same thing, less coding and a lot faster.

使用 $out :

db.Cool.aggregate([
    { "$group": {
        "_id": "$id",
        "cools": { "$push": "$cool" }
    }},
    { "$out": "reduced" }
])

为了记录,这是mapReduce输出:

For the record, here is the mapReduce output:

{ "_id" : "a", "value" : { "cools" : [ "a1", "a2" ] } }
{ "_id" : "b", "value" : { "cools" : [ "b1", "b2" ] } }
{ "_id" : "c", "value" : { "cools" : [ "c1" ] } }
{ "_id" : "d", "value" : { "cools" : [ "d1" ] } }

和合计输出.与mapReduce _idvalue强制输出的唯一区别在于,键是反向的,因为$group不保证顺序(但通常将其视为反向堆栈):

And the aggregate output. With the only difference from the mapReduce _id and value madatory output being that the keys are reversed, since $group does not guarantee an order ( but is generally observed as a reverse stack ):

{ "_id" : "d", "cools" : [ "d1" ] }
{ "_id" : "c", "cools" : [ "c1" ] }
{ "_id" : "b", "cools" : [ "b1", "b2" ] }
{ "_id" : "a", "cools" : [ "a1", "a2" ] }

这篇关于MongoDB中的MapReduce函数-按ID对文档进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆