mongodb在一次操作中进行多个聚合 [英] mongodb multiple aggregations in single operation

查看:719
本文介绍了mongodb在一次操作中进行多个聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下文件的物品收藏.

I have an item collection with following documents.

{ "item" : "i1", "category" : "c1", "brand" : "b1" }  
{ "item" : "i2", "category" : "c2", "brand" : "b1" }  
{ "item" : "i3", "category" : "c1", "brand" : "b2" }  
{ "item" : "i4", "category" : "c2", "brand" : "b1" }  
{ "item" : "i5", "category" : "c1", "brand" : "b2" }  

我想分开汇总结果->按类别计数,按品牌计数.请注意,它不是按(类别,品牌)计算的

I want to separate aggregation results --> count by category, count by brand. Please note, it is not count by (category,brand)

我可以使用以下代码使用map-reduce做到这一点.

I am able to do this using map-reduce using following code.

map = function(){
    emit({type:"category",category:this.category},1);
    emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
    return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})

结果是

{
        "results" : [
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b2"
                        },
                        "value" : 2
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c2"
                        },
                        "value" : 2
                }
        ],
        "timeMillis" : 21,
        "counts" : {
                "input" : 5,
                "emit" : 10,
                "reduce" : 4,
                "output" : 4
        },
        "ok" : 1,
}

我可以通过触发以下两个不同的聚合命令来获得相同的结果.

I can get same results by firing two different aggregation commands as below.

db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})

无论如何,我可以通过单个聚合命令使用聚合框架来执行相同的操作.

Is there anyway I can do the same using aggregation framework by single aggregation command.

我在这里简化了案例,但实际上我需要从子文档数组中的字段进行分组.假设上面是我放松后的结构.

I have simplified my case here, but in actual I need this grouping from fields in array of subdocuments. Assume the above is structure after I do unwind.

这是一个实时查询(等待响应的人),尽管在较小的数据集上,所以执行时间很重要.

It is a real-time query (someone waiting for response), though on smaller dataset, so execution time is important.

我正在使用MongoDB 2.4.

I am using MongoDB 2.4.

推荐答案

在大型数据集上,我会说您当前的mapReduce方法将是最好的方法,因为针对这种情况的聚合技术不适用于大型数据.但是可能只是您需要的尺寸很小而已:

Over a large data set I would say that your current mapReduce approach would be the best one, because the aggregation technique for this would not work well with large data. But possibly over a reasonably small size it might just be what you need:

db.items.aggregate([
    { "$group": {
        "_id": null,
        "categories": { "$push": "$category" },
        "brands": { "$push": "$brand" }
    }},
    { "$project": {
        "_id": {
            "categories": "$categories",
            "brands": "$brands"
        },
        "categories": 1
    }},
    { "$unwind": "$categories" },
    { "$group": {
        "_id": {
            "brands": "$_id.brands",
            "category": "$categories"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.brands",
        "categories": { "$push": {
            "category": "$_id.category",
            "count": "$count"
        }},
    }},
    { "$project": {
        "_id": "$categories",
        "brands": "$_id"
    }},
    { "$unwind": "$brands" },
    { "$group": {
        "_id": {
            "categories": "$_id",
            "brand": "$brands"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "categories": { "$first": "$_id.categories" },
        "brands": { "$push": {
            "brand": "$_id.brand",
            "count": "$count"
        }}
    }}
])

与mapReduce输出并不完全相同,您可以投入更多的时间来更改输出格式,但这应该可以使用:

Not really the same as the mapReduce output, you could throw in some more stages to change the output format, but this should be usable:

{
    "_id" : null,
    "categories" : [
            {
                    "category" : "c2",
                    "count" : 2
            },
            {
                    "category" : "c1",
                    "count" : 3
            }
    ],
    "brands" : [
            {
                    "brand" : "b2",
                    "count" : 2
            },
            {
                    "brand" : "b1",
                    "count" : 3
            }
    ]
}

如您所见,为了在同一管道流程中将每组类别"或品牌"进行分组,这需要在阵列之间进行大量混洗.我再说一次,这对于大数据来说效果不佳,但是对于诸如按顺序排列的项目"之类的东西,它可能会做得很好.

As you can see, this involves a fair bit of shuffling between arrays in order to group each set of either "category" or "brand" within the same pipeline process. Again I will say, this will not do well for large data, but for something like "items in an order" it would probably do nicely.

当然,正如您所说的,您已经做了一些简化,因此null上的第一个分组键要么是其他键,要么是通过更早的 $match来缩小null情况的范围. 阶段,这可能就是您想要做的.

Of course as you say, you have simplified somewhat, so the first grouping key on null is either going to be something else or either narrowed down to do that null case by an earlier $match stage, which is probably what you want to do.

这篇关于mongodb在一次操作中进行多个聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆