MongoDB聚合汇总子文档上的每个键 [英] MongoDB Aggregate Sum Each Key on a Subdocument

查看:180
本文介绍了MongoDB聚合汇总子文档上的每个键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个这种架构的文件,每个文件是每天每件产品:

I have multiple documents with this schema, each document is per product per day:

{
    _id:{},
    app_id:'DHJFK67JDSJjdasj909',
    date:'2014-08-07',
    event_count:32423,
    event_count_per_type: {
        0:322,
        10:4234,
        20:653,
        30:7562
    }
}

我想得到特定日期范围的每个event_type的总和。

这个是我正在寻找的输出,其中每个事件类型已在所有文档中求和。 event_count_per_type的键可以是任何东西,所以我需要能够循环遍历每个键的东西,而不必隐含其名称。

I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.

{
    app_id:'DHJFK67JDSJjdasj909',
    event_count:324236456,
    event_count_per_type: {
        0:34234222,
        10:242354,
        20:456476,
        30:56756
    }
}

到目前为止,我一直在尝试几个查询,这是我到目前为止最好的查询,但是子文档值没有求和:

I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:

db.events.aggregate(
{
    $match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
    $group: {
        _id: {
            app_id:'$app_id',
        },
        event_count: {$sum:'$event_count'},
        event_count_per_type: {$sum:'$event_count_per_type'}
    }
},
{
    $project: {
        _id:0,
        app_id:'$_id.app_id',
        event_count:1,
        event_count_per_type:1
    }
}
)

我看到的输出是一个值event_count_per_type键为0,而不是对象。我可以修改模式,因此密钥位于文档的顶层,但这仍然意味着我需要在每个密钥的组语句中都有一个条目,因为我不知道密钥名称是什么我不能请。

The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.

任何帮助都会受到赞赏,我愿意在需要的时候更改我的架构并尝试mapReduce(尽管从文档中可以看出性能很差) 。)

Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)

推荐答案

如上所述,聚合框架无法处理这样的文档,除非你实际上将提供所有的密钥,例如:

As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:

db.events.aggregate([
   { "$group": {
       "_id": "$app_id",
       "event_count": { "$sum": "$event_count" },
       "0": { "$sum": "$event_count_per_type.0" },
       "10": { "$sum": "$event_count_per_type.10" }
       "20": { "$sum": "$event_count_per_type.20" }
       "30": { "$sum": "$event_count_per_type.30" }
   }}
])

但您当然必须明确指定您希望处理的每个键。对于MongoDB中的聚合框架和一般查询操作都是如此,对于访问以子文档形式标注的元素,您需要指定元素的确切路径以便对其执行任何操作。

But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.

聚合框架和一般查询没有遍历的概念,这意味着它们无法处理文档的每个密钥。这需要一个语言结构,以便在这些接口中没有提供。

The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.

一般来说,使用密钥名称作为实际名称的数据点代表价值是一种反模式。更好的方法是使用数组并将type表示为一个值:

Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:

{
    "app_id": "DHJFK67JDSJjdasj909",
    "date: ISODate("2014-08-07T00:00:00.000Z"),
    "event_count": 32423,
    "events": [
        { "type": 0,  "value": 322  },
        { "type": 10, "value": 4234 },
        { "type": 20, "value": 653  },
        { "type": 30, "value": 7562 }
    ]
}

同时注意到日期现在是一个正确的日期对象而不是字符串,这也是一个很好的做法。虽然使用聚合框架很容易处理数据:

Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:

db.events.aggregate([
    { "$unwind": "$events" },
    { "$group": {
        "_id": { 
            "app_id": "$app_id",
            "type": "$events.type"
        },
        "event_count": { "$sum": "$event_count" },
        "value": { "$sum": "$value" }
    }},
    { "$group": {
        "_id": "$_id.app_id",
        "event_count": { "$sum": "$event_count" },
        "events": { "$push": { "type": "$_id.type", "value": "$value" } }
    }}
]) 

这显示了一个两阶段分组,首先得到每个类型的总数而不指定每个密钥,因为你不再需要,然后每个app_id返回一个文档,结果在最初存储的数组中。这种数据形式通常可以更灵活地查看特定范围内的某些类型甚至值。

That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.

如果您无法更改结构,那么您的唯一选项是mapReduce。这允许您编码密钥的遍历,但由于这需要JavaScript解释和执行,因此它不如聚合框架快:

Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:

db.events.mapReduce(
    function() {
        emit(
            this.app_id,
            {
                "event_count": this.event_count,
                "event_count_per_type": this.event_count_per_type
            }
        );
    },
    function(key,values) {

        var reduced = { "event_count": 0, "event_count_per_type": {} };

        values.forEach(function(value) {
            for ( var k in value.event_count_per_type ) {
                if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
                    reduced.event_count_per_type[k] = 0;
                reduced.event_count_per_type += value.event_count_per_type;
            }
            reduced.event_count += value.event_count;
        })
    },
    {
        "out": { "inline": 1 }
    }
)

这将基本上遍历并组合键并总结每个找到的值。

That will essentially traverse and combine the "keys" and sum up the values for each one found.

所以您可以选择:


  1. 更改结构并使用标准查询和聚合。

  2. 继续使用结构并需要JavaScript处理和mapReduce。

这取决于您的实际需求,但在大多数情况下案件重组带来好处。

It depends on your actual needs, but in most cases restructuring yields benefits.

这篇关于MongoDB聚合汇总子文档上的每个键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆