MongoDB对子文档上的每个键的聚合总和 [英] MongoDB Aggregate Sum Each Key on a Subdocument

查看:22
本文介绍了MongoDB对子文档上的每个键的聚合总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个具有此架构的文档,每个文档每天针对每个产品:

{_ID:{},app_id:'DHJFK67JDSJjdasj909',日期:'2014-08-07',事件计数:32423,event_count_per_type:{0:322,10:4234,20:653,30:7562}}

我想获取特定 date_range 的每个 event_type 的总和.
这是我正在寻找的输出,其中每个事件类型已在所有文档中求和.event_count_per_type 的键可以是任何东西,所以我需要可以循环遍历它们中的每一个的东西,而不是必须隐含它们的名称.

{app_id:'DHJFK67JDSJjdasj909',事件计数:324236456,event_count_per_type:{0:34234222,10:242354,20:456476,30:56756}}

到目前为止,我一直在尝试多个查询,这是迄今为止我得到的最好的查询,但未对子文档值求和:

db.events.aggregate({$match: {app_id:'DHJFK67JDSJjdasj909'}},{$组:{_ID: {app_id:'$app_id',},event_count: {$sum:'$event_count'},event_count_per_type: {$sum:'$event_count_per_type'}}},{$项目:{_id:0,app_id:'$_id.app_id',事件计数:1,event_count_per_type:1}})

我看到的输出是 event_count_per_type 键的值 0,而不是对象.我可以修改架构,使键位于文档的顶层,但这仍然意味着我需要在 group 语句中为每个键都有一个条目,因为我不知道键名是什么我不能做.

任何帮助将不胜感激,如果需要,我愿意更改我的架构并尝试 mapReduce(尽管从文档看来性能很差.)

解决方案

如前所述,使用聚合框架无法处理这样的文档,除非您确实要提供所有密钥,例如:

db.events.aggregate([{$组":{"_id": "$app_id","event_count": { "$sum": "$event_count" },"0": { "$sum": "$event_count_per_type.0" },"10": { "$sum": "$event_count_per_type.10" }"20": { "$sum": "$event_count_per_type.20" }"30": { "$sum": "$event_count_per_type.30" }}}])

但您当然必须明确指定每个您希望使用的键.MongoDB 中的聚合框架和一般查询操作都是如此,至于访问此子文档"表单中标记的元素,您需要指定元素的确切路径"以便对其执行任何操作.

聚合框架和一般查询没有遍历"的概念,这意味着它们不能处理文档的每个键".这需要一个语言结构来实现,而这些接口中没有提供.

不过,一般来说,使用键名"作为数据点,它的名称实际上代表值"有点反模式".对此进行建模的更好方法是使用数组并将您的类型"本身表示为值:

{"app_id": "DHJFK67JDSJjdasj909","日期:ISODate("2014-08-07T00:00:00.000Z"),事件计数":32423,事件":[{类型":0,值":322},{类型":10,值":4234},{类型":20,值":653},{类型":30,值":7562}]}

还要注意日期"现在是一个正确的日期对象而不是字符串,这也是一个很好的做法.不过这种数据很容易用聚合框架处理:

db.events.aggregate([{ "$unwind": "$events" },{$组":{_ID": {"app_id": "$app_id","type": "$events.type"},"event_count": { "$sum": "$event_count" },价值":{$sum":$价值"}}},{$组":{"_id": "$_id.app_id","event_count": { "$sum": "$event_count" },"事件": { "$push": { "type": "$_id.type", "value": "$value" } }}}])

这显示了一个两阶段分组,首先获取每个类型"的总数而不指定每个键",因为您不再需要指定每个键",然后作为每个app_id"的单个文档返回,结果在数组中最初存储.这种数据形式对于查看特定类型"甚至特定范围内的值"通常要灵活得多.

如果你不能改变结构,那么你唯一的选择就是 mapReduce.这允许您对键的遍历进行编码",但是由于这需要 JavaScript 解释和执行,因此它不如聚合框架快:

db.events.mapReduce(功能() {发射(this.app_id,{事件计数":this.event_count,event_count_per_type":this.event_count_per_type});},功能(键,值){var 减少 = { "event_count": 0, "event_count_per_type": {} };values.forEach(函数(值){for ( var k in value.event_count_per_type ) {如果( !redcuced.event_count_per_type.hasOwnProperty(k) )减少.event_count_per_type[k] = 0;减少.event_count_per_type += value.event_count_per_type;}减少.event_count += value.event_count;})},{出":{内联":1}})

这将基本上遍历并组合键"并总结找到的每个键的值.

所以你的选择是:

  1. 更改结构并使用标准查询和聚合.
  2. 保持结构不变,需要 JavaScript 处理和 mapReduce.

这取决于您的实际需求,但在大多数情况下,重组会带来好处.

I have multiple documents with this schema, each document is per product per day:

{
    _id:{},
    app_id:'DHJFK67JDSJjdasj909',
    date:'2014-08-07',
    event_count:32423,
    event_count_per_type: {
        0:322,
        10:4234,
        20:653,
        30:7562
    }
}

I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.

{
    app_id:'DHJFK67JDSJjdasj909',
    event_count:324236456,
    event_count_per_type: {
        0:34234222,
        10:242354,
        20:456476,
        30:56756
    }
}

I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:

db.events.aggregate(
{
    $match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
    $group: {
        _id: {
            app_id:'$app_id',
        },
        event_count: {$sum:'$event_count'},
        event_count_per_type: {$sum:'$event_count_per_type'}
    }
},
{
    $project: {
        _id:0,
        app_id:'$_id.app_id',
        event_count:1,
        event_count_per_type:1
    }
}
)

The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.

Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)

解决方案

As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:

db.events.aggregate([
   { "$group": {
       "_id": "$app_id",
       "event_count": { "$sum": "$event_count" },
       "0": { "$sum": "$event_count_per_type.0" },
       "10": { "$sum": "$event_count_per_type.10" }
       "20": { "$sum": "$event_count_per_type.20" }
       "30": { "$sum": "$event_count_per_type.30" }
   }}
])

But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.

The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.

Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:

{
    "app_id": "DHJFK67JDSJjdasj909",
    "date: ISODate("2014-08-07T00:00:00.000Z"),
    "event_count": 32423,
    "events": [
        { "type": 0,  "value": 322  },
        { "type": 10, "value": 4234 },
        { "type": 20, "value": 653  },
        { "type": 30, "value": 7562 }
    ]
}

Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:

db.events.aggregate([
    { "$unwind": "$events" },
    { "$group": {
        "_id": { 
            "app_id": "$app_id",
            "type": "$events.type"
        },
        "event_count": { "$sum": "$event_count" },
        "value": { "$sum": "$value" }
    }},
    { "$group": {
        "_id": "$_id.app_id",
        "event_count": { "$sum": "$event_count" },
        "events": { "$push": { "type": "$_id.type", "value": "$value" } }
    }}
]) 

That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.

Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:

db.events.mapReduce(
    function() {
        emit(
            this.app_id,
            {
                "event_count": this.event_count,
                "event_count_per_type": this.event_count_per_type
            }
        );
    },
    function(key,values) {

        var reduced = { "event_count": 0, "event_count_per_type": {} };

        values.forEach(function(value) {
            for ( var k in value.event_count_per_type ) {
                if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
                    reduced.event_count_per_type[k] = 0;
                reduced.event_count_per_type += value.event_count_per_type;
            }
            reduced.event_count += value.event_count;
        })
    },
    {
        "out": { "inline": 1 }
    }
)

That will essentially traverse and combine the "keys" and sum up the values for each one found.

So you options are either:

  1. Change the structure and work with standard queries and aggregation.
  2. Stay with the structure and require JavaScript processing and mapReduce.

It depends on your actual needs, but in most cases restructuring yields benefits.

这篇关于MongoDB对子文档上的每个键的聚合总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆