使用聚合在mongoDB中限制和排序每个组 [英] limit and sort each group by in mongoDB using aggregation

查看:14
本文介绍了使用聚合在mongoDB中限制和排序每个组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 mongoDB 中对每个组进行排序和限制.

How can I sort and limit each group by in mongoDB.

考虑以下数据:

Country:USA,name:xyz,rating:10,id:x
Country:USA,name:xyz,rating:10,id:y
Country:USA,name:xyz,rating:10,id:z
Country:USA,name:abc,rating:5,id:x
Country:India,name:xyz,rating:5,id:x
Country:India,name:xyz,rating:5,id:y
Country:India,name:abc,rating:10,id:z
Country:India,name:abc,rating:10,id:x

现在说我将按国家分组并按评分排序,并将每组的数据限制为 2.

Now say I will group by country and sort by rating and limit the data of each group by 2.

所以答案是:

Country:USA
name:xyz,rating:10,id:x
name:xyz,rating:10,id:y
Country:India
name:abc,rating:10,id:x
name:abc,rating:10,id:z

我只想使用聚合框架来完成此操作.

I want to accomplish this using aggregate framework only.

我尝试在聚合中包含排序以进行评级,但简单的查询在处理后没有结果.

I tried including sort in aggregate for rating but simply query turns no results after processing.

推荐答案

您最好的选择是为每个国家/地区"运行单独的查询(最好是并行)并返回组合结果.查询非常简单,只需对评分值进行排序后返回前 2 个值,即使您需要执行多个查询以获得完整结果,也会很快执行.

Your best option here is to run seperate queries for each "Country" ( ideally in parallel ) and return the combined results. The queries are quite simple, and just return the top 2 values after applying a sort on the rating value and will execute quite quickly even if you need to perform multiple queries to obtain the complete result.

现在甚至在不久的将来,聚合框架都不太适合这种情况.问题是没有这样的运算符以任何方式限制"任何分组的结果.因此,为了做到这一点,您基本上需要将所有内容 $push 放入一个数组中并从中提取top n"值.

The aggregation framework is not a good fit for this, now and even in the near future. The problem is there is no such operator that "limits" the result of any grouping in any way. So in order to do this, you basically need to $push all content into an array and extract the "top n" values from that.

当前需要执行的操作非常糟糕,核心问题是结果可能会超过大多数真实数据源上每个文档 16MB 的 BSON 限制.

The current operations needed to do that are pretty horrible, and the core problem is results are likely to exceed the BSON limit of 16MB per document on most real data sources.

还有一个 n 复杂性,因为您现在必须这样做.但只是为了演示 2 个项目:

Also there is an n complexity to this due to how you would have to do it right now. But just to demonstrate with 2 items:

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        },
        "first": { 
            "$first": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},

    // Unwind the array
    { "$unwind": "results" },

    // Remove the seen result from the array
    { "$redact": {
        "$cond": {
            "if": { "$eq": [ "$results.id", "$first.id" ] },
            "then": "$$PRUNE",
            "else": "$$KEEP"
        }
    }},

    // Group to return the second result which is now first on stack
    { "$group": {
        "_id": "$_id",
        "first": { "$first": "$first" },
        "second": { 
            "$first": {
                "name": "$results.name", 
                "rating": "$results.rating",
                "id": "$results.id"
            }
        }
    }},

    // Optionally put these in an array format
    { "$project": {
        "results": { 
            "$map": {
                "input": ["A","B"],
                "as": "el",
                "in": {
                    "$cond": {
                        "if": { "$eq": [ "$$el", "A" ] },
                        "then": "$first",
                        "else": "$second"
                    }
                }
            }
        }
    }}
])

这得到了结果,但它不是一个很好的方法,并且在迭代更高的限制时变得更加复杂,甚至在某些情况下分组返回的结果可能少于 n 个.

That gets the result but its not a great approach and gets a lot more complex with iterations for higher limits or even where groupings have possibly less than n results to return in some cases.

目前的开发系列( 3.1.x )在编写时有一个 $slice 运算符,这使得这更简单一些,但仍然有相同的大小"陷阱:

The current development series ( 3.1.x ) as of writing has a $slice operator that makes this a bit more simple, but still has the same "size" pitfall:

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},
    { "$project": {
        "results": { "$slice": [ "$results", 2 ] }
    }}
])

但基本上直到聚合框架有某种方式限制"由 $push 或类似的分组限制"运算符产生的项目数量之前,聚合框架并不是真正的最佳解决方案对于这类问题.

But basically until the aggregation framework has some way to "limit" the number of items produced by $push or a similar grouping "limit" operator, then the aggregation framework is not really the optimal solution for this type of problem.

这样的简单查询:

db.collection.find({ "Country": "USA" }).sort({ "rating": -1 }).limit(1)

为每个不同的国家/地区运行,理想情况下,通过线程的事件循环并行处理并结合结果产生目前最优化的方法.他们只获取需要的内容,这是聚合框架在此类分组中尚无法处理的大问题.

Run for each distinct country and ideally in parallel processing by event loop of thread with a combined result produces the most optimal approach right now. They only fetch what is needed, which is the big problem the aggregation framework cannot yet handle in such grouping.

因此,请寻求支持以针对您选择的语言以最佳方式执行此组合查询结果",因为它比将其扔到聚合框架中要简单得多,性能要高得多.

So look for support to do this "combined query results" in the most optimal way for your chosen language instead, as it will be far less complex and much more performant than throwing this at the aggregation framework.

这篇关于使用聚合在mongoDB中限制和排序每个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆