使用聚合框架进行分组和计数 [英] Group and count using aggregation framework

查看:26
本文介绍了使用聚合框架进行分组和计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对以下结构进行分组和计数:

I'm trying to group and count the following structure:

[{
    "_id" : ObjectId("5479c4793815a1f417f537a0"),
    "status" : "canceled",
    "date" : ISODate("2014-11-29T00:00:00.000Z"),
    "offset" : 30,
    "devices" : [ 
        {
            "name" : "Mouse",
            "cost" : 150,
        }, 
        {
            "name" : "Keyboard",
            "cost" : 200,
        }
    ],
},
{
    "_id" : ObjectId("5479c4793815a1f417d557a0"),
    "status" : "done",
    "date" : ISODate("2014-10-20T00:00:00.000Z"),
    "offset" : 30,
    "devices" : [ 
        {
            "name" : "LCD",
            "cost" : 150,
        }, 
        {
            "name" : "Keyboard",
            "cost" : 200,
        }
    ],
}
,
{
    "_id" : ObjectId("5479c4793815a1f417f117a0"),
    "status" : "done",
    "date" : ISODate("2014-12-29T00:00:00.000Z"),
    "offset" : 30,
    "devices" : [ 
        {
            "name" : "Headphones",
            "cost" : 150,
        }, 
        {
            "name" : "LCD",
            "cost" : 200,
        }
    ],
}]

我需要这样的分组和计数:

I need group and count something like that:

 "result" : [ 
        {
            "_id" : {
                "status" : "canceled"
            },
            "count" : 1
        }, 
        {
            "_id" : {
                "status" : "done"
            },
            "count" : 2
        },
    totaldevicecost: 730,

    ],
    "ok" : 1
}

我在计算子数组设备"中的成本总和时遇到的问题.该怎么做?

My problem in calculating cost sum in subarray "devices". How to do that?

推荐答案

您似乎对此有所了解,但您在其他一些概念上迷路了.在文档中处理数组时有一些基本的事实,但让我们从你离开的地方开始:

It seems like you got a start on this but you got lost on some of the other concepts. There are some basic truths when working with arrays in documents, but's let's start where you left off:

db.sample.aggregate([
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 }
    }}
])

所以这只是要使用 $group 管道收集关于status"字段的不同值的文档,然后还为count"生成另一个字段,这当然会计数"分组的出现通过将 1 的值传递给 $sum 运算符用于找到的每个文档.这使您处于与您描述的非常相似的位置:

So that is just going to use the $group pipeline to gather up your documents on the different values of the "status" field and then also produce another field for "count" which of course "counts" the occurrences of the grouping key by passing a value of 1 to the $sum operator for each document found. This puts you at a point much like you describe:

{ "_id" : "done", "count" : 2 }
{ "_id" : "canceled", "count" : 1 }

这是第一阶段,很容易理解,但现在您需要知道如何从数组中获取值.一旦您了解了 "dot notation" 概念,您可能会受到诱惑正确地做这样的事情:

That's the first stage of this and easy enough to understand, but now you need to know how to get values out of an array. You might then be tempted once you understand the "dot notation" concept properly to do something like this:

db.sample.aggregate([
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$devices.cost" }
    }}
])

但您会发现,对于每个结果,总数"实际上是 0:

But what you will find is that the "total" will in fact be 0 for each of those results:

{ "_id" : "done", "count" : 2, "total" : 0 }
{ "_id" : "canceled", "count" : 1, "total" : 0 }

为什么?那么像这样的MongoDB聚合操作在分组时实际上并不遍历数组元素.为了做到这一点,聚合框架有一个名为 $unwind.这个名字是相对不言自明的.MongoDB 中的嵌入式数组很像链接数据源之间的一对多"关联.那么 $unwind 确实是那种加入"结果,其中生成的文档"基于数组的内容和每个父级的重复信息.

Why? Well MongoDB aggregation operations like this do not actually traverse array elements when grouping. In order to do that, the aggregation framework has a concept called $unwind. The name is relatively self-explanatory. An embedded array in MongoDB is much like having a "one-to-many" association between linked data sources. So what $unwind does is exactly that sort of "join" result, where the resulting "documents" are based on the content of the array and duplicated information for each parent.

因此,为了对数组元素采取行动,您需要使用 $unwind 首先.从逻辑上讲,这应该会引导您编写如下代码:

So in order to act on array elements you need to use $unwind first. This should logically lead you to code like this:

db.sample.aggregate([
    { "$unwind": "$devices" },
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$devices.cost" }
    }}
])

然后是结果:

{ "_id" : "done", "count" : 4, "total" : 700 }
{ "_id" : "canceled", "count" : 2, "total" : 350 }

但这并不完全正确,不是吗?还记得你刚刚从 $unwind 中学到的东西,以及它是如何与父信息进行非规范化连接的吗?所以现在每个文档都是重复的,因为它们都有两个数组成员.因此,虽然总计"字段是正确的,但计数"是每种情况下应有的两倍.

But that isn't quite right is it? Remember what you just learned from $unwind and how it does a de-normalized join with the parent information? So now that is duplicated for every document since both had two array member. So while the "total" field is correct, the "count" is twice as much as it should be in each case.

需要多加注意,所以不要在单个 $group阶段,分两步完成:

A bit more care needs to be taken, so instead of doing this in a single $group stage, it is done in two:

db.sample.aggregate([
    { "$unwind": "$devices" },
    { "$group": {
        "_id": "$_id",
        "status": { "$first": "$status" },
        "total": { "$sum": "$devices.cost" }
    }},
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$total" }
    }}
])

现在得到的结果是正确的总数:

Which now gets the result with correct totals in it:

{ "_id" : "canceled", "count" : 1, "total" : 350 }
{ "_id" : "done", "count" : 2, "total" : 700 }

现在数字是正确的,但仍然不是您所要求的.我认为您应该停在那里,因为您期望的结果实际上并不适合仅来自聚合的单个结果.您正在寻找结果内部"的总数.它确实不属于那里,但是在小数据上还可以:

Now the numbers are right, but it is still not exactly what you are asking for. I would think you should stop there as the sort of result you are expecting is really not suited to just a single result from aggregation alone. You are looking for the total to be "inside" the result. It really doesn't belong there, but on small data it is okay:

db.sample.aggregate([
    { "$unwind": "$devices" },
    { "$group": {
        "_id": "$_id",
        "status": { "$first": "$status" },
        "total": { "$sum": "$devices.cost" }
    }},
    { "$group": {
        "_id": "$status",
        "count": { "$sum": 1 },
        "total": { "$sum": "$total" }
    }},
    { "$group": {
        "_id": null,
        "data": { "$push": { "count": "$count", "total": "$total" } },
        "totalCost": { "$sum": "$total" }
    }}
])

还有一个最终的结果表格:

And a final result form:

{
    "_id" : null,
    "data" : [
            {
                    "count" : 1,
                    "total" : 350
            },
            {
                    "count" : 2,
                    "total" : 700
            }
    ],
    "totalCost" : 1050
}

但是,不要那样做".MongoDB 对响应的文档限制为 16MB,这是 BSON 规范的限制.在较小的结果上,您可以进行这种方便的包装,但在较大的方案中,您希望以早期形式获得结果,并且要么单独查询,要么迭代整个结果,以便从所有文档中获取总数.

But, "Do Not Do That". MongoDB has a document limit on response of 16MB, which is a limitation of the BSON spec. On small results you can do this kind of convenience wrapping, but in the larger scheme of things you want the results in the earlier form and either a separate query or live with iterating the whole results in order to get the total from all documents.

您使用的 MongoDB 版本似乎低于 2.6,或者从不支持最新版本功能的 RoboMongo shell 复制输出.从 MongoDB 2.6 开始,聚合的结果可以是光标"而不是单个 BSON 数组.因此,整体响应可能远大于 16MB,但仅当您不压缩为单个文档作为结果时,如上一个示例所示.

You do appear to be using a MongoDB version less than 2.6, or copying output from a RoboMongo shell which does not support the latest version features. From MongoDB 2.6 though the results of aggregation can be a "cursor" rather than a single BSON array. So the overall response can be much larger than 16MB, but only when you are not compacting to a single document as results, shown for the last example.

在您分页"结果的情况下尤其如此,结果行有 100 到 1000 行,但您只希望在 API 响应中返回总数",而您只返回页面"一次 25 个结果.

This would be especially true in cases where you were "paging" the results, with 100's to 1000's of result lines but you just wanted a "total" to return in an API response when you are only returning a "page" of 25 results at a time.

无论如何,这应该会给你一个合理的指导,告诉你如何从你的通用文档表单中获得你期望的结果类型.记住 $unwind 用于处理数组,通常是 $group 多次,以便从您的文档和集合分组中获得不同分组级别的总数.

Anyhow, that should give you a reasonable guide on how to get the type of results you are expecting from your common document form. Remember $unwind in order to process arrays, and generally $group multiple times in order to get totals at different grouping levels from your document and collection groupings.

这篇关于使用聚合框架进行分组和计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆