如何与分组依据汇总并正确排序 [英] How to aggregate with group by and sort correctly

查看:80
本文介绍了如何与分组依据汇总并正确排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Mongodb. 考虑我的下一个文档:

I'm using Mongodb. Consider my next document:

{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 1, created: ISODate("2014-05-01..."), another_col : "f" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col : "a" }

我想做的是对uid进行简单的分组,并按降序对创建的内容进行排序,这样我就可以获得每个uid的第一行.

What I'm trying to do is a simple groupby on the uid and sorting the created by descending order so i could get the first row for each uid.

预期输出的示例

{ uid: 1, created: ISODate("2014-05-05..."), another_col: "y" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col: "a" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col: "w" }

我能得到的最好的是:

db.mycollection.aggregate( {$group: {_id: "$uid", rows: {$push: { "created" : "$created" }}}}, sort { // doesnt work well }  )

任何人都可以指导我进行分组依据和排序的正确组合吗? 它只是没有按我预期的那样工作. (注意:我检查了很多线程,但是找不到适合我情况的正确答案)

Anyone can guide me for the right combination of group by and sorting? It just doesn't work as I was expecting. (note: I have checked many threads, but I'm unable to find the correct answer for my case)

推荐答案

这里有一些需要理解的地方.

There are a few catches here to understand.

当您使用 $group 将按照发现边界的顺序对它们进行排序,而无需初始阶段或结束阶段

When you use $group the boundaries will be sorted in the order that they were discovered without either an initial or ending stage $sort operation. So if your documents were originally in an order like this:

{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },

然后仅使用 $group 没有 $sort 在管道的最后将返回以下结果:

Then just using $group without a $sort on the end on the pipeline would return you results like this:

{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },

这是一个概念,但实际上看起来您想要的结果要求您按所寻找的uid的排序顺序返回最后其他字段".在这种情况下,获得结果的方法实际上是先 $sort ,然后再使用

That is one concept, but it actually seems like what you are expecting in results requires returning the "last other fields" by a sorted order of the uid is what you are looking for. In that case the way to get your result is actually to $sort first and then make use of the $last operator:

db.mycollection.aggregate([

    // Sorts everything first by _id and created
    { "$sort": { "_id": 1, "created": 1 } },

    // Group with the $last results from each boundary
    { "$group": {
        "_id": "$uid",
        "created": { "$last": "$created" },
        "another_col": { "$last": "$created" }
    }}
])

或者本质上将排序应用于您想要的内容.

Or essentially apply the sort to what you want.

$last $max 是后者将为分组_id中的给定字段选择最高"值,而与当前按未排序顺序进行排序无关.另一方面, $last 将选择与最后一个"分组_id值相同的行"中出现的值.

The difference between $last and $max is that the latter will choose the "highest" value for the given field within the grouping _id, regardless of the current sorted on un-sorted order. On the other hand, $last will choose the value that occurs in the same "row" as the "last" grouping _id value.

如果您实际上正在寻找对数组值进行排序的方法,则此方法类似.保持数组成员处于创建"的顺序,您也将首先进行排序:

If you were actually looking to sort the values of an array then the approach is similar. Keeping the array members in "created" order you would also sort first:

db.mycollection.aggregate([

    // Sorts everything first by _id and created
    { "$sort": { "_id": 1, "created": 1 } },

    // Group with the $last results from each boundary
    { "$group": {
        "_id": "$uid",
        "row": {
            "$push": {
                "created": "$created",
                "another_col": "$another_col"
            }
        }
    }}
])

具有这些字段的文档将按照它们已经被排序的顺序添加到数组中.

And the documents with those fields will be added to the array with the order they were already sorted by.

这篇关于如何与分组依据汇总并正确排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆