在 MongoDB 中合并两个集合 [英] Merging two collections in MongoDB

查看:22
本文介绍了在 MongoDB 中合并两个集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在 MongoDB 中使用 MapReduce 来完成我认为简单的过程.我不知道这是否是正确的方法,我是否应该使用 MapReduce.我用谷歌搜索了我想到的关键字,并试图找到我认为我会取得最大成功的文档——但什么也没有.可能是我想太多了吧?

I've been trying to use MapReduce in MongoDB to do what I think is a simple procedure. I don't know if this is the right approach, of if I should even be using MapReduce. I googled what keywords I thought of and tried to hit the docs where I thought I would have the most success - but nothing. Maybe I'm thinking too hard about this?

我有两个集合:detailsgpas

details 由一大堆文档(3+ 百万)组成.studentid 元素可以重复两次,每个 year 重复一次,如下所示:

details is made up of a whole bunch of documents (3+ million). The studentid element can be repeated two times, one for each year, like the following:

{ "_id" : ObjectId("4d49b7yah5b6d8372v640100"), "classes" : [1,17,19,21], "studentid" : "12345a", "year" : 1}
{ "_id" : ObjectId("4d76b7oij7s2d8372v640100"), "classes" : [2,12,19,22], "studentid" : "98765a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8372v640100"), "classes" : [32,91,101,217], "studentid" : "12345a", "year" : 2}
{ "_id" : ObjectId("4d76b7rty7s2d8372v640100"), "classes" : [1,11,18,22], "studentid" : "24680a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8856v640100"), "classes" : [32,99,110,215], "studentid" : "98765a", "year" : 2}
...

gpas 的元素与 details 中的 studentid 相同.每个studentid只有一个条目,像这样:

gpas has elements with the same studentid's from details. Only one entry per studentid, like this:

{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "overall" : 76, "subscore": 2}
...

最后,我希望以这种格式为每个学生创建一个包含一行的集合:

In the end I want to have a collection with one row for each student in this format:

{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "classes_1": [1,17,19,21], "classes_2": [32,91,101,217], "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "classes_1": [2,12,19,22], "classes_2": [32,99,110,215], "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "classes_1": [1,11,18,22], "classes_2": [], "overall" : 76, "subscore": 2}
...

我打算这样做的方法是像这样运行 MapReduce:

The way I was going to do this was by running MapReduce like this:

var mapDetails = function() {
    emit(this.studentid, {studentid: this.studentid, classes: this.classes, year: this.year, overall: 0, subscore: 0});
};

var mapGpas = function() {
    emit(this.studentid, {studentid: this.studentid, classes: [], year: 0, overall: this.overall, subscore: this.subscore});
};

var reduce = function(key, values) {
    var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};

    values.forEach(function(value) {
        if (value.year == 0) {
            outs.overall = value.overall;
            outs.subscore = value.subscore;
        }
        else {
            if (value.year == 1) {
                outs.classes_1 = value.classes;
            }
            if (value.year == 2) {
                outs.classes_2 = value.classes;
            }

            outs.studentid = value.studentid;
        }
    });

    return outs;

};

res = db.details.mapReduce(mapDetails, reduce, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, reduce, {out: {reduce: 'joined'}})

但是当我运行它时,这是我的结果集合:

But when I run it, this is my resulting collection:

{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 85, "subscore" : 5 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }

我缺少类数组.

另外,顺便说一句,我如何访问生成的 MapReduce value 元素中的元素?MapReduce 是否总是输出到 value 或您命名的任何其他内容?

Also, as an aside, how do I access the elements in resulting MapReduce value element? Does MapReduce always output to value or whatever else you name it?

推荐答案

这类似于在 MongoDB-users Google Groups 上提出的问题.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1

This is similar to a question that was asked on the MongoDB-users Google Groups.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1

答案引用了一个与您的示例类似的在线教程:http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

The answer references an on-line tutorial which looks similar to your example: http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/

有关 MongoDB 中的 MapReduce 的更多信息,请参阅文档:http://www.mongodb.org/display/DOCS/MapReduce

For more information on MapReduce in MongoDB, please see the documentation: http://www.mongodb.org/display/DOCS/MapReduce

此外,在题为使用版本化文档查找最大值和最小值"的 MongoDB 食谱文章的附加"部分中,有一个有用的分步演练,介绍了 MapReduce 操作的工作原理:http://cookbook.mongodb.org/patterns/finding_max_and_min/

Additionally, there is a useful step-by-step walkthrough of how a MapReduce operation works in the "Extras" Section of the MongoDB Cookbook article titled, "Finding Max And Min Values with Versioned Documents": http://cookbook.mongodb.org/patterns/finding_max_and_min/

如果您已经阅读了一些参考文件,请原谅我.我将它们包括在内是为了其他可能正在阅读这篇文章并且不熟悉在 MongoDB 中使用 MapReduce 的用户

Forgive me if you have already read some of the referenced documents. I have included them for the benefit of other users who may be reading this post and new to using MapReduce in MongoDB

Map 函数中emit"语句的输出必须与 Reduce 函数的输出相匹配,这一点很重要.如果 Map 函数只输出一个文档,Reduce 函数可能根本没有运行,然后您的输出集合将包含不匹配的文档.

It is important that the outputs from the 'emit' statements in the Map functions match the outputs of the Reduce function. If there is only one document output by the Map function, the Reduce function might not be run at all, and then your output collection will have mismatched documents.

我稍微修改了您的 map 语句,以您想要的输出格式发出文档,并带有两个单独的类"数组.
我还修改了您的 reduce 语句,以将新类添加到 classes_1 和 classes_2 数组中,前提是它们尚不存在.

I have slightly modified your map statements to emit documents in the format of your desired output, with two separate "classes" arrays.
I have also reworked your reduce statement to add new classes to the classes_1 and classes_2 arrays, only if they do not already exist.

var mapDetails = function(){
    var output = {studentid: this.studentid, classes_1: [], classes_2: [], year: this.year, overall: 0, subscore: 0}
    if (this.year == 1) {
        output.classes_1 = this.classes;
    }
    if (this.year == 2) {
        output.classes_2 = this.classes;
    }
    emit(this.studentid, output);
};

var mapGpas = function() {
    emit(this.studentid, {studentid: this.studentid, classes_1: [], classes_2: [], year: 0, overall: this.overall, subscore: this.subscore});
};

var r = function(key, values) {
    var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};

    values.forEach(function(v){
        outs.studentid = v.studentid;
        v.classes_1.forEach(function(class){if(outs.classes_1.indexOf(class)==-1){outs.classes_1.push(class)}})
        v.classes_2.forEach(function(class){if(outs.classes_2.indexOf(class)==-1){outs.classes_2.push(class)}})

        if (v.year == 0) {
            outs.overall = v.overall;
            outs.subscore = v.subscore;
        }
    });
    return outs;
};

res = db.details.mapReduce(mapDetails, r, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, r, {out: {reduce: 'joined'}})

运行两个 MapReduce 操作会产生以下集合,它与您所需的格式相匹配:

Running the two MapReduce operations results in the following collection, which matches your desired format:

> db.joined.find()
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ 1, 17, 19, 21 ], "classes_2" : [ 32, 91, 101, 217 ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ 1, 11, 18, 22 ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ 2, 12, 19, 22 ], "classes_2" : [ 32, 99, 110, 215 ], "overall" : 85, "subscore" : 5 } }
>

MapReduce 总是以 {_id:"id", value:"value"} 的形式输出文档在标题为Dot Notation (Reaching into Objects)"的文档中有更多关于使用子文档的信息:http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

MapReduce always outputs documents in the form of {_id:"id", value:"value"} There is more information available on working with sub-documents in the document titled, "Dot Notation (Reaching into Objects)": http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

如果您希望 MapReduce 的输出以不同的格式显示,您必须在应用程序中以编程方式执行此操作.

If you would like the output of MapReduce to appear in a different format, you will have to do that programmatically in your application.

希望这将提高您对 MapReduce 的理解,并使您更接近生成所需的输出集合.祝你好运!

Hopefully this will improve your understanding of MapReduce, and get you one step closer to producing your desired output collection. Good Luck!

这篇关于在 MongoDB 中合并两个集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆