使用汇总框架进行分组和计数 [英] Group and count using aggregation framework

查看:67
本文介绍了使用汇总框架进行分组和计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对以下结构进行分组和计数:

  [{"_id":ObjectId("5479c4793815a1f417f537a0"),状态":已取消","date":ISODate("2014-11-29T00:00:00.000Z"),偏移":30,设备" : [{"name":"Mouse",费用":150,},{"name":"Keyboard",费用":200,}],},{"_id":ObjectId("5479c4793815a1f417d557a0"),"status":完成","date":ISODate("2014-10-20T00:00:00.000Z"),偏移":30,设备" : [{名称":"LCD",费用":150,},{"name":"Keyboard",费用":200,}],},{"_id":ObjectId("5479c4793815a1f417f117a0"),"status":完成","date":ISODate("2014-12-29T00:00:00.000Z"),偏移":30,设备" : [{"name":"Headphones",费用":150,},{名称":"LCD",费用":200,}],}] 

我需要分组并计数:

 结果":[{_ID" : {状态":取消"},计数":1},{_ID" : {状态":完成"},计数":2},设备总费用:730,],好":1} 

我在计算设备"子数组中的费用总和时遇到的问题.该怎么做?

解决方案

似乎您对此有所了解,但对其他一些概念却迷失了.在文档中使用数组时,有一些基本的事实,但让我们从上次中断的地方开始:

 <代码> db.sample.aggregate([{"$ group":{"_id":"$ status","count":{"$ sum":1}}}]) 

因此仅使用 $分组 管道以在状态"字段的不同值上收集文档,然后还为计数"生成另一个字段,该字段当然计数"分组的出现通过将 1 的值传递给 $ sum 运算符,用于找到的每个文档.这使您的观点很像您所描述的:

 <代码> {"_id":"done","count":2}{"_id":已取消","count":1} 

这是第一步,很容易理解,但是现在您需要知道如何从数组中获取值.一旦您理解了点符号" 概念,您可能会很受诱惑正确地做这样的事情:

 <代码> db.sample.aggregate([{"$ group":{"_id":"$ status","count":{"$ sum":1},总计":{"$ sum":"$ devices.cost"}}}]) 

但是您会发现,对于每个结果,总计"实际上为 0 :

 <代码> {"_id":"done","count":2,2,"total":0}{"_id":已取消","count":1,1,"total":0} 

为什么?很好的MongoDB聚合操作在分组时实际上不会遍历数组元素.为此,聚合框架具有称为 $ unwind .这个名字是相对不言自明的.MongoDB中的嵌入式阵列非常类似于链接数据源之间的一对多"关联.那么 $ unwind 所做的正是这种连接"结果,其中的文档"基于数组的内容和每个父级的重复信息.

因此,要对数组元素进行操作,您需要使用 首先 $ unwind .从逻辑上讲,这应该使您编写如下代码:

 <代码> db.sample.aggregate([{"$ unwind":"$ devices"},{"$ group":{"_id":"$ status","count":{"$ sum":1},总计":{"$ sum":"$ devices.cost"}}}]) 

然后是结果:

  {"_id":"done","count":4,4,"total":700}{"_id":已取消","count":2,"total":350} 

但这不是很正确吗?还记得您刚刚从 $ unwind 中学到的知识以及它如何与父信息进行非规范化联接吗?因此,由于每个文档都有两个数组成员,因此每个文档都将重复使用.因此,在总计"字段正确的情况下,计数"是每种情况下的两倍.

需要多加注意,因此不要在单个 <代码> db.sample.aggregate([{"$ unwind":"$ devices"},{"$ group":{"_id":"$ _ id","status":{"$ first":"$ status"},总计":{"$ sum":"$ devices.cost"}}},{"$ group":{"_id":"$ status","count":{"$ sum":1},总计":{"$ sum":"$ total"}}}])

现在得到的结果中包含正确的总数:

  {"_id":已取消","count":1,"total":350}{"_id":"done","count":2,2,"total":700} 

现在数字是正确的,但是仍然不是您所要的.我认为您应该停在那里,因为您期望的那种结果确实不适合仅来自聚合的单个结果.您正在寻找总数在结果里面"的东西.它确实不属于那里,但是在小数据上是可以的:

 <代码> db.sample.aggregate([{"$ unwind":"$ devices"},{"$ group":{"_id":"$ _ id","status":{"$ first":"$ status"},总计":{"$ sum":"$ devices.cost"}}},{"$ group":{"_id":"$ status","count":{"$ sum":1},总计":{"$ sum":"$ total"}}},{"$ group":{"_id":null,"data":{"$ push":{"count":"$ count","total":"$ total"}},"totalCost":{"$ sum":"$ total"}}}]) 

以及最终结果表格:

 <代码> {"_id":null,数据" : [{计数":1总计":350},{计数":2总计":700}],总费用":1050} 

但是,请勿这样做" .MongoDB的文档响应限制为16MB,这是BSON规范的限制.在较小的结果上,您可以进行这种方便的包装,但是在较大的方案中,您希望结果采用较早的形式,并且需要单独的查询或对整个结果进行迭代以便从所有文档中获取总计./p>

您似乎使用的是MongoDB 2.6版以下的版本,或者从不支持最新版本功能的RoboMongo Shell复制输出.从MongoDB 2.6起,聚合的结果可以是游标",而不是单个BSON数组.因此,总的响应可能比16MB大得多,但是只有当您没有将结果压缩为单个文档时(如上一个示例所示).

在您分页"结果的情况下尤其如此,结果行有100到1000的结果,但是当您只返回页面"的结果时,您只想在API响应中返回总计"一次有25个结果.

无论如何,那应该为您提供一个合理的指南,说明如何从通用文档表格中获得所需的结果类型.记住 $ unwind 以处理数组,通常 $分组 ,以便从您的文档和集合分组中获得不同分组级别的总数.

I'm trying to group and count the following structure:

[{
    "_id" : ObjectId("5479c4793815a1f417f537a0"),
    "status" : "canceled",
    "date" : ISODate("2014-11-29T00:00:00.000Z"),
    "offset" : 30,
    "devices" : [ 
        {
            "name" : "Mouse",
            "cost" : 150,
        }, 
        {
            "name" : "Keyboard",
            "cost" : 200,
        }
    ],
},
{
    "_id" : ObjectId("5479c4793815a1f417d557a0"),
    "status" : "done",
    "date" : ISODate("2014-10-20T00:00:00.000Z"),
    "offset" : 30,
    "devices" : [ 
        {
            "name" : "LCD",
            "cost" : 150,
        }, 
        {
            "name" : "Keyboard",
            "cost" : 200,
        }
    ],
}
,
{
    "_id" : ObjectId("5479c4793815a1f417f117a0"),
    "status" : "done",
    "date" : ISODate("2014-12-29T00:00:00.000Z"),
    "offset" : 30,
    "devices" : [ 
        {
            "name" : "Headphones",
            "cost" : 150,
        }, 
        {
            "name" : "LCD",
            "cost" : 200,
        }
    ],
}]

I need group and count something like that:

 "result" : [ 
        {
            "_id" : {
                "status" : "canceled"
            },
            "count" : 1
        }, 
        {
            "_id" : {
                "status" : "done"
            },
            "count" : 2
        },
    totaldevicecost: 730,

    ],
    "ok" : 1
}

My problem in calculating cost sum in subarray "devices". How to do that?

解决方案

It seems like you got a start on this but you got lost on some of the other concepts. There are some basic truths when working with arrays in documents, but's let's start where you left off:

db.sample.aggregate([
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 }
    }}
])

So that is just going to use the $group pipeline to gather up your documents on the different values of the "status" field and then also produce another field for "count" which of course "counts" the occurrences of the grouping key by passing a value of 1 to the $sum operator for each document found. This puts you at a point much like you describe:

{ "_id" : "done", "count" : 2 }
{ "_id" : "canceled", "count" : 1 }

That's the first stage of this and easy enough to understand, but now you need to know how to get values out of an array. You might then be tempted once you understand the "dot notation" concept properly to do something like this:

db.sample.aggregate([
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$devices.cost" }
    }}
])

But what you will find is that the "total" will in fact be 0 for each of those results:

{ "_id" : "done", "count" : 2, "total" : 0 }
{ "_id" : "canceled", "count" : 1, "total" : 0 }

Why? Well MongoDB aggregation operations like this do not actually traverse array elements when grouping. In order to do that, the aggregation framework has a concept called $unwind. The name is relatively self-explanatory. An embedded array in MongoDB is much like having a "one-to-many" association between linked data sources. So what $unwind does is exactly that sort of "join" result, where the resulting "documents" are based on the content of the array and duplicated information for each parent.

So in order to act on array elements you need to use $unwind first. This should logically lead you to code like this:

db.sample.aggregate([
    { "$unwind": "$devices" },
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$devices.cost" }
    }}
])

And then the result:

{ "_id" : "done", "count" : 4, "total" : 700 }
{ "_id" : "canceled", "count" : 2, "total" : 350 }

But that isn't quite right is it? Remember what you just learned from $unwind and how it does a de-normalized join with the parent information? So now that is duplicated for every document since both had two array member. So while the "total" field is correct, the "count" is twice as much as it should be in each case.

A bit more care needs to be taken, so instead of doing this in a single $group stage, it is done in two:

db.sample.aggregate([
    { "$unwind": "$devices" },
    { "$group": {
        "_id": "$_id",
        "status": { "$first": "$status" },
        "total": { "$sum": "$devices.cost" }
    }},
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$total" }
    }}
])

Which now gets the result with correct totals in it:

{ "_id" : "canceled", "count" : 1, "total" : 350 }
{ "_id" : "done", "count" : 2, "total" : 700 }

Now the numbers are right, but it is still not exactly what you are asking for. I would think you should stop there as the sort of result you are expecting is really not suited to just a single result from aggregation alone. You are looking for the total to be "inside" the result. It really doesn't belong there, but on small data it is okay:

db.sample.aggregate([
    { "$unwind": "$devices" },
    { "$group": {
        "_id": "$_id",
        "status": { "$first": "$status" },
        "total": { "$sum": "$devices.cost" }
    }},
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$total" }
    }},
    { "$group": {
        "_id": null,
        "data": { "$push": { "count": "$count", "total": "$total" } },
        "totalCost": { "$sum": "$total" }
    }}
])

And a final result form:

{
    "_id" : null,
    "data" : [
            {
                    "count" : 1,
                    "total" : 350
            },
            {
                    "count" : 2,
                    "total" : 700
            }
    ],
    "totalCost" : 1050
}

But, "Do Not Do That". MongoDB has a document limit on response of 16MB, which is a limitation of the BSON spec. On small results you can do this kind of convenience wrapping, but in the larger scheme of things you want the results in the earlier form and either a separate query or live with iterating the whole results in order to get the total from all documents.

You do appear to be using a MongoDB version less than 2.6, or copying output from a RoboMongo shell which does not support the latest version features. From MongoDB 2.6 though the results of aggregation can be a "cursor" rather than a single BSON array. So the overall response can be much larger than 16MB, but only when you are not compacting to a single document as results, shown for the last example.

This would be especially true in cases where you were "paging" the results, with 100's to 1000's of result lines but you just wanted a "total" to return in an API response when you are only returning a "page" of 25 results at a time.

Anyhow, that should give you a reasonable guide on how to get the type of results you are expecting from your common document form. Remember $unwind in order to process arrays, and generally $group multiple times in order to get totals at different grouping levels from your document and collection groupings.

这篇关于使用汇总框架进行分组和计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆