使用mongo聚合将文档成对分组 [英] Grouping documents in pairs using mongo aggregation

查看:91
本文介绍了使用mongo聚合将文档成对分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些物品,

[ a, b, c, d ]

我想将它们成对分组,例如

And I want to group them in pairs such as,

[ [ a, b ], [ b, c ], [ c, d ] ]

这将用于计算原始集合中每个项目之间的差异,但是可以使用多种技术来解决该问题,例如

This will be used in calculating the differences between each item in the original collection, but that part is solved using several techniques such as the one in this question.

我知道map reduce可以实现,但是我想知道聚合是否可以实现.

I know that this is possible with map reduce, but I want to know if it's possible with aggregation.

这是一个示例,

项目集合;每个项目都是一个实际文件.

The collection of items; each item is an actual document.

[
    { val: 1 },
    { val: 3 },
    { val: 6 },
    { val: 10 },
]

分组版本:

[
    [ { val: 1 }, { val: 3 } ], 
    [ { val: 3 }, { val: 6 } ],
    [ { val: 6 }, { val: 10 } ]
]

结果收集(或聚合结果):

The resulting collection (or aggregation result):

[
    { diff: 2 },
    { diff: 3 },
    { diff: 4 }
]

推荐答案

这是聚合框架无法完成的,并且当前可用于此类操作的唯一MongoDB方法是mapReduce.

This is something that just cannot be done with the aggregation framework, and the only current MongoDB method available for this type of operation is mapReduce.

原因是聚合框架除了当前文档外,无法引用管道中的任何其他文档.实际上,这也适用于分组"管道阶段,因为即使将内容分组在键"上,您也无法真正按照想要的方式处理单个文档.

The reason being that the a aggregation framework has no way of referring to any other document in the pipeline than the present one. This actually applies to "grouping" pipeline stages as well, since even though things are grouped on a "key" you cant really deal with individual documents in the way you want to.

MapReduce具有一个可用的功能,可让您在此处执行所需的操作,它甚至与聚合不直接"相关.实际上,这是在所有阶段都具有全局范围内的变量"的能力.而拥有一个变量"以基本上存储最后一个文档"是您实现结果所需要的.

MapReduce on the other hand has one feature available that allows you to do what you want here, and it's not even "directly" related to aggregation. It is in fact the ability to have "globally scoped variables" across all stages. And having a "variable" to basically "store the last document" is all you need to achieve your result.

所以这是非常简单的代码,实际上不需要减少":

So it's quite simple code, and there is in fact no "reduction" required:

db.collection.mapReduce(
    function () {
      if (lastVal != null)
        emit( this._id, this.val - lastVal );
      lastVal = this.val;
    },
    function() {}, // mapper is not called
    {
        "scope": { "lastVal": null },
        "out": { "inline": 1 }
    }
)

哪一个给你的结果很像这样:

Which gives you a result much like this:

{
    "results" : [
            {
                    "_id" : ObjectId("54a425a99b8bcd6f73e2d662"),
                    "value" : 2
            },
            {
                    "_id" : ObjectId("54a425a99b8bcd6f73e2d663"),
                    "value" : 3
            },
            {
                    "_id" : ObjectId("54a425a99b8bcd6f73e2d664"),
                    "value" : 4
            }
    ],
    "timeMillis" : 3,
    "counts" : {
            "input" : 4,
            "emit" : 3,
            "reduce" : 0,
            "output" : 3
    },
    "ok" : 1
}

这实际上只是选择唯一的东西"作为发出的_id值,而不是任何特定的值,因为所有这些实际上是在不同文档上的值之间的差异.

That's really just picking "something unique" as the emitted _id value rather than anything specific, because all this is really doing is the difference between values on differing documents.

全局变量通常是这些类型的配对"聚合或产生运行总计"的解决方案.现在,聚合框架无法访问全局变量,尽管它很可能具有全局性. mapReduce框架具有它们,因此可以公平地说它们也应可用于聚合框架.

Global variables are usually the solution to these types of "pairing" aggregations or producing "running totals". Right now the aggregation framework has no access to global variables, even though it might well be a nice this to have. The mapReduce framework has them, so it is probably fair to say that they should be available to the aggregation framework as well.

现在它们还没有,所以请坚持使用mapReduce.

Right now they are not though, so stick with mapReduce.

这篇关于使用mongo聚合将文档成对分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆