使用聚合在mongoDB中对每个组进行限制和排序 [英] limit and sort each group by in mongoDB using aggregation

查看:74
本文介绍了使用聚合在mongoDB中对每个组进行限制和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在mongoDB中对每个组进行排序和限制.

How can I sort and limit each group by in mongoDB.

考虑以下数据:

Country:USA,name:xyz,rating:10,id:x
Country:USA,name:xyz,rating:10,id:y
Country:USA,name:xyz,rating:10,id:z
Country:USA,name:abc,rating:5,id:x
Country:India,name:xyz,rating:5,id:x
Country:India,name:xyz,rating:5,id:y
Country:India,name:abc,rating:10,id:z
Country:India,name:abc,rating:10,id:x

现在说我将按国家/地区分组并按评分排序,并将每个组的数据限制为2.

Now say I will group by country and sort by rating and limit the data of each group by 2.

所以答案应该是:

Country:USA
name:xyz,rating:10,id:x
name:xyz,rating:10,id:y
Country:India
name:abc,rating:10,id:x
name:abc,rating:10,id:z

我只想使用聚合框架来完成此任务.

I want to accomplish this using aggregate framework only.

我尝试在汇总中加入排序以进行评分,但是查询在处理后没有任何结果.

I tried including sort in aggregate for rating but simply query turns no results after processing.

推荐答案

您最好的选择是对每个国家/地区"运行单独的查询(最好是并行运行),然后返回合并的结果.这些查询非常简单,只需对评级值进行排序后就返回前2个值,即使您需要执行多个查询以获得完整的结果,查询也将很快执行.

Your best option here is to run seperate queries for each "Country" ( ideally in parallel ) and return the combined results. The queries are quite simple, and just return the top 2 values after applying a sort on the rating value and will execute quite quickly even if you need to perform multiple queries to obtain the complete result.

现在甚至在不久的将来,聚合框架都不适合此.问题是没有这样的运算符以任何方式限制"任何分组的结果.因此,为此,您基本上需要将所有内容$push放入数组,并从中提取前n个"值.

The aggregation framework is not a good fit for this, now and even in the near future. The problem is there is no such operator that "limits" the result of any grouping in any way. So in order to do this, you basically need to $push all content into an array and extract the "top n" values from that.

当前需要执行的操作非常可怕,而且核心问题是在大多数实际数据源上结果可能超过每个文档16MB的BSON限制.

The current operations needed to do that are pretty horrible, and the core problem is results are likely to exceed the BSON limit of 16MB per document on most real data sources.

此外,由于您现在将必须执行此操作,因此它具有n的复杂性.但仅以2个项目进行演示:

Also there is an n complexity to this due to how you would have to do it right now. But just to demonstrate with 2 items:

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        },
        "first": { 
            "$first": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},

    // Unwind the array
    { "$unwind": "results" },

    // Remove the seen result from the array
    { "$redact": {
        "$cond": {
            "if": { "$eq": [ "$results.id", "$first.id" ] },
            "then": "$$PRUNE",
            "else": "$$KEEP"
        }
    }},

    // Group to return the second result which is now first on stack
    { "$group": {
        "_id": "$_id",
        "first": { "$first": "$first" },
        "second": { 
            "$first": {
                "name": "$results.name", 
                "rating": "$results.rating",
                "id": "$results.id"
            }
        }
    }},

    // Optionally put these in an array format
    { "$project": {
        "results": { 
            "$map": {
                "input": ["A","B"],
                "as": "el",
                "in": {
                    "$cond": {
                        "if": { "$eq": [ "$$el", "A" ] },
                        "then": "$first",
                        "else": "$second"
                    }
                }
            }
        }
    }}
])

这会得到结果,但它不是一个好方法,并且对于更高的限制甚至在某些情况下分组返回的结果可能小于n的情况下,迭代也会变得更加复杂.

That gets the result but its not a great approach and gets a lot more complex with iterations for higher limits or even where groupings have possibly less than n results to return in some cases.

截至撰写本文时,当前的开发系列(3.1.x)具有$slice运算符,该运算符虽然更简单,但仍具有相同的大小"陷阱:

The current development series ( 3.1.x ) as of writing has a $slice operator that makes this a bit more simple, but still has the same "size" pitfall:

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},
    { "$project": {
        "results": { "$slice": [ "$results", 2 ] }
    }}
])

但是基本上,直到聚合框架具有某种方法来限制"由$push或类似的分组"limit"运算符产生的项目数,然后聚合框架才真正不是针对此类问题的最佳解决方案.

But basically until the aggregation framework has some way to "limit" the number of items produced by $push or a similar grouping "limit" operator, then the aggregation framework is not really the optimal solution for this type of problem.

像这样的简单查询:

db.collection.find({ "Country": "USA" }).sort({ "rating": -1 }).limit(1)

针对每个不同的国家/地区运行,并且理想情况下,通过线程的事件循环并组合结果进行并行处理,可以得出目前最佳的方法.他们只获取需要的东西,这是聚合框架无法在这种分组中解决的大问题.

Run for each distinct country and ideally in parallel processing by event loop of thread with a combined result produces the most optimal approach right now. They only fetch what is needed, which is the big problem the aggregation framework cannot yet handle in such grouping.

因此,请寻求支持以最适合您所选语言的方式来执行此组合查询结果",因为与将其扔到聚合框架上相比,它的复杂性要低得多,性能要好得多.

So look for support to do this "combined query results" in the most optimal way for your chosen language instead, as it will be far less complex and much more performant than throwing this at the aggregation framework.

这篇关于使用聚合在mongoDB中对每个组进行限制和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆