在MongoDB中合并两个集合 [英] Merging two collections in MongoDB

查看:71
本文介绍了在MongoDB中合并两个集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在MongoDB中使用MapReduce来完成我认为的简单过程.我不知道这是否是正确的方法,甚至不应该使用MapReduce.我用谷歌搜索了我想到的关键字,并尝试在文档中找到我认为最成功的文档-但没有成功.也许我对此太在思索了?

I've been trying to use MapReduce in MongoDB to do what I think is a simple procedure. I don't know if this is the right approach, of if I should even be using MapReduce. I googled what keywords I thought of and tried to hit the docs where I thought I would have the most success - but nothing. Maybe I'm thinking too hard about this?

我有两个集合:detailsgpas

details由一堆文件(3+百万个)组成. studentid元素可以重复两次,每个year重复一次,如下所示:

details is made up of a whole bunch of documents (3+ million). The studentid element can be repeated two times, one for each year, like the following:

{ "_id" : ObjectId("4d49b7yah5b6d8372v640100"), "classes" : [1,17,19,21], "studentid" : "12345a", "year" : 1}
{ "_id" : ObjectId("4d76b7oij7s2d8372v640100"), "classes" : [2,12,19,22], "studentid" : "98765a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8372v640100"), "classes" : [32,91,101,217], "studentid" : "12345a", "year" : 2}
{ "_id" : ObjectId("4d76b7rty7s2d8372v640100"), "classes" : [1,11,18,22], "studentid" : "24680a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8856v640100"), "classes" : [32,99,110,215], "studentid" : "98765a", "year" : 2}
...

gpas具有与details相同的studentid元素.每个studentid仅一个条目,例如:

gpas has elements with the same studentid's from details. Only one entry per studentid, like this:

{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "overall" : 76, "subscore": 2}
...

最后,我想为每个学生提供一个具有以下格式的行:

In the end I want to have a collection with one row for each student in this format:

{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "classes_1": [1,17,19,21], "classes_2": [32,91,101,217], "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "classes_1": [2,12,19,22], "classes_2": [32,99,110,215], "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "classes_1": [1,11,18,22], "classes_2": [], "overall" : 76, "subscore": 2}
...

我要执行此操作的方法是通过像这样运行MapReduce:

The way I was going to do this was by running MapReduce like this:

var mapDetails = function() {
    emit(this.studentid, {studentid: this.studentid, classes: this.classes, year: this.year, overall: 0, subscore: 0});
};

var mapGpas = function() {
    emit(this.studentid, {studentid: this.studentid, classes: [], year: 0, overall: this.overall, subscore: this.subscore});
};

var reduce = function(key, values) {
    var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};

    values.forEach(function(value) {
        if (value.year == 0) {
            outs.overall = value.overall;
            outs.subscore = value.subscore;
        }
        else {
            if (value.year == 1) {
                outs.classes_1 = value.classes;
            }
            if (value.year == 2) {
                outs.classes_2 = value.classes;
            }

            outs.studentid = value.studentid;
        }
    });

    return outs;

};

res = db.details.mapReduce(mapDetails, reduce, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, reduce, {out: {reduce: 'joined'}})

但是,当我运行它时,这就是我得到的集合:

But when I run it, this is my resulting collection:

{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 85, "subscore" : 5 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }

我缺少类数组.

此外,此外,如何访问生成的MapReduce value元素中的元素? MapReduce是否始终输出到value或您命名的其他任何内容?

Also, as an aside, how do I access the elements in resulting MapReduce value element? Does MapReduce always output to value or whatever else you name it?

推荐答案

这类似于在MongoDB用户Google网上论坛上提出的问题.
https://groups.google.com/group/mongodb -user/browse_thread/thread/60a8b683e2626ada?pli = 1

This is similar to a question that was asked on the MongoDB-users Google Groups.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1

答案引用了一个在线教程,该教程类似于您的示例: http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

The answer references an on-line tutorial which looks similar to your example: http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

有关MongoDB中MapReduce的更多信息,请参阅文档: http://www.mongodb.org/display/DOCS/MapReduce

For more information on MapReduce in MongoDB, please see the documentation: http://www.mongodb.org/display/DOCS/MapReduce

此外,MongoDB Cookbook文章的其他"部分提供了一个有用的分步演练,介绍MapReduce操作的工作方式,该文章标题为使用版本化文档查找最大和最小值": http://cookbook.mongodb.org/patterns/finding_max_and_min/

Additionally, there is a useful step-by-step walkthrough of how a MapReduce operation works in the "Extras" Section of the MongoDB Cookbook article titled, "Finding Max And Min Values with Versioned Documents": http://cookbook.mongodb.org/patterns/finding_max_and_min/

如果您已经阅读了一些参考文档,请原谅我.我将它们包括在内是为了使其他用户受益,他们可能正在阅读这篇文章,并且是在MongoDB中使用MapReduce的新手.

Forgive me if you have already read some of the referenced documents. I have included them for the benefit of other users who may be reading this post and new to using MapReduce in MongoDB

重要的是,Map函数中'emit'语句的输出必须与Reduce函数的输出相匹配.如果Map函数仅输出一个文档,那么Reduce函数可能根本不会运行,那么您的输出集合将包含不匹配的文档.

It is important that the outputs from the 'emit' statements in the Map functions match the outputs of the Reduce function. If there is only one document output by the Map function, the Reduce function might not be run at all, and then your output collection will have mismatched documents.

我已经稍微修改了您的map语句,以使用两个单独的类"数组以所需输出的格式发出文档.
我还对您的reduce语句进行了重新设计,以将新类添加到classes_1和classes_2数组中(仅当它们尚不存在时).

I have slightly modified your map statements to emit documents in the format of your desired output, with two separate "classes" arrays.
I have also reworked your reduce statement to add new classes to the classes_1 and classes_2 arrays, only if they do not already exist.

var mapDetails = function(){
    var output = {studentid: this.studentid, classes_1: [], classes_2: [], year: this.year, overall: 0, subscore: 0}
    if (this.year == 1) {
        output.classes_1 = this.classes;
    }
    if (this.year == 2) {
        output.classes_2 = this.classes;
    }
    emit(this.studentid, output);
};

var mapGpas = function() {
    emit(this.studentid, {studentid: this.studentid, classes_1: [], classes_2: [], year: 0, overall: this.overall, subscore: this.subscore});
};

var r = function(key, values) {
    var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};

    values.forEach(function(v){
        outs.studentid = v.studentid;
        v.classes_1.forEach(function(class){if(outs.classes_1.indexOf(class)==-1){outs.classes_1.push(class)}})
        v.classes_2.forEach(function(class){if(outs.classes_2.indexOf(class)==-1){outs.classes_2.push(class)}})

        if (v.year == 0) {
            outs.overall = v.overall;
            outs.subscore = v.subscore;
        }
    });
    return outs;
};

res = db.details.mapReduce(mapDetails, r, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, r, {out: {reduce: 'joined'}})

运行这两个MapReduce操作将得到以下集合,该集合与所需的格式相匹配:

Running the two MapReduce operations results in the following collection, which matches your desired format:

> db.joined.find()
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ 1, 17, 19, 21 ], "classes_2" : [ 32, 91, 101, 217 ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ 1, 11, 18, 22 ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ 2, 12, 19, 22 ], "classes_2" : [ 32, 99, 110, 215 ], "overall" : 85, "subscore" : 5 } }
>

MapReduce始终以{_id:"id",value:"value"}的形式输出文档 在标题为点表示法(到达对象)"的文档中,有更多有关使用子文档的信息: http://www.mongodb.org/display/DOCS /Dot + Notation +%28Reaching + into + Objects%29

MapReduce always outputs documents in the form of {_id:"id", value:"value"} There is more information available on working with sub-documents in the document titled, "Dot Notation (Reaching into Objects)": http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

如果您希望MapReduce的输出以另一种格式显示,则必须在您的应用程序中以编程方式进行.

If you would like the output of MapReduce to appear in a different format, you will have to do that programmatically in your application.

希望这将增进您对MapReduce的理解,并使您离生成所需的输出集合更近一步.祝你好运!

Hopefully this will improve your understanding of MapReduce, and get you one step closer to producing your desired output collection. Good Luck!

这篇关于在MongoDB中合并两个集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆