使用聚合在 mongoDB 中限制和排序每个组 [英] limit and sort each group by in mongoDB using aggregation

查看:16
本文介绍了使用聚合在 mongoDB 中限制和排序每个组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 mongoDB 中对每个组进行排序和限制.

How can I sort and limit each group by in mongoDB.

考虑以下数据:

Country:USA,name:xyz,rating:10,id:x
Country:USA,name:xyz,rating:10,id:y
Country:USA,name:xyz,rating:10,id:z
Country:USA,name:abc,rating:5,id:x
Country:India,name:xyz,rating:5,id:x
Country:India,name:xyz,rating:5,id:y
Country:India,name:abc,rating:10,id:z
Country:India,name:abc,rating:10,id:x

现在说我将按国家分组,按评级排序,并将每组的数据限制为 2.

Now say I will group by country and sort by rating and limit the data of each group by 2.

所以答案是:

Country:USA
name:xyz,rating:10,id:x
name:xyz,rating:10,id:y
Country:India
name:abc,rating:10,id:x
name:abc,rating:10,id:z

我只想使用聚合框架来实现这一点.

I want to accomplish this using aggregate framework only.

我尝试在评分中包含聚合排序,但只是查询在处理后没有结果.

I tried including sort in aggregate for rating but simply query turns no results after processing.

推荐答案

您最好的选择是为每个国家/地区"运行单独的查询(最好并行)并返回组合结果.查询非常简单,只需在对评级值进行排序后返回前 2 个值,即使您需要执行多个查询才能获得完整结果,它也会很快执行.

Your best option here is to run seperate queries for each "Country" ( ideally in parallel ) and return the combined results. The queries are quite simple, and just return the top 2 values after applying a sort on the rating value and will execute quite quickly even if you need to perform multiple queries to obtain the complete result.

聚合框架不适合这个,现在甚至在不久的将来.问题是没有这样的运算符以任何方式限制"任何分组的结果.因此,为了做到这一点,您基本上需要将所有内容 $push 放入一个数组中,并从中提取前 n"个值.

The aggregation framework is not a good fit for this, now and even in the near future. The problem is there is no such operator that "limits" the result of any grouping in any way. So in order to do this, you basically need to $push all content into an array and extract the "top n" values from that.

当前所需的操作非常可怕,核心问题是在大多数真实数据源上,结果很可能超过每个文档 16MB 的 BSON 限制.

The current operations needed to do that are pretty horrible, and the core problem is results are likely to exceed the BSON limit of 16MB per document on most real data sources.

此外,由于您现在必须执行的操作,因此存在 n 复杂性.但只是为了演示 2 项:

Also there is an n complexity to this due to how you would have to do it right now. But just to demonstrate with 2 items:

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        },
        "first": { 
            "$first": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},

    // Unwind the array
    { "$unwind": "results" },

    // Remove the seen result from the array
    { "$redact": {
        "$cond": {
            "if": { "$eq": [ "$results.id", "$first.id" ] },
            "then": "$$PRUNE",
            "else": "$$KEEP"
        }
    }},

    // Group to return the second result which is now first on stack
    { "$group": {
        "_id": "$_id",
        "first": { "$first": "$first" },
        "second": { 
            "$first": {
                "name": "$results.name", 
                "rating": "$results.rating",
                "id": "$results.id"
            }
        }
    }},

    // Optionally put these in an array format
    { "$project": {
        "results": { 
            "$map": {
                "input": ["A","B"],
                "as": "el",
                "in": {
                    "$cond": {
                        "if": { "$eq": [ "$$el", "A" ] },
                        "then": "$first",
                        "else": "$second"
                    }
                }
            }
        }
    }}
])

这得到了结果,但它不是一个很好的方法,并且在迭代更高的限制时变得更加复杂,甚至在某些情况下分组的返回结果可能少于 n.

That gets the result but its not a great approach and gets a lot more complex with iterations for higher limits or even where groupings have possibly less than n results to return in some cases.

目前的开发系列 (3.1.x) 在编写时有一个 $slice 操作符,这使得这更简单一些,但仍然有相同的大小"陷阱:

The current development series ( 3.1.x ) as of writing has a $slice operator that makes this a bit more simple, but still has the same "size" pitfall:

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},
    { "$project": {
        "results": { "$slice": [ "$results", 2 ] }
    }}
])

但基本上,除非聚合框架有某种方法来限制"$push 或类似的分组限制"运算符产生的项目数量,那么聚合框架并不是真正的最佳解决方案对于此类问题.

But basically until the aggregation framework has some way to "limit" the number of items produced by $push or a similar grouping "limit" operator, then the aggregation framework is not really the optimal solution for this type of problem.

像这样的简单查询:

db.collection.find({ "Country": "USA" }).sort({ "rating": -1 }).limit(1)

针对每个不同的国家/地区运行,理想情况下,通过线程的事件循环并行处理并结合结果产生目前最优化的方法.它们只获取需要的东西,这是聚合框架在这种分组中还无法处理的大问题.

Run for each distinct country and ideally in parallel processing by event loop of thread with a combined result produces the most optimal approach right now. They only fetch what is needed, which is the big problem the aggregation framework cannot yet handle in such grouping.

因此,请寻求支持,以针对您选择的语言以最佳方式执行此组合查询结果",因为与将其放在聚合框架中相比,它的复杂性和性能要高得多.

So look for support to do this "combined query results" in the most optimal way for your chosen language instead, as it will be far less complex and much more performant than throwing this at the aggregation framework.

这篇关于使用聚合在 mongoDB 中限制和排序每个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆