如何与分组依据汇总并正确排序 [英] How to aggregate with group by and sort correctly
问题描述
我正在使用Mongodb. 考虑我的下一个文档:
I'm using Mongodb. Consider my next document:
{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 1, created: ISODate("2014-05-01..."), another_col : "f" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col : "a" }
我想做的是对uid进行简单的分组,并按降序对创建的内容进行排序,这样我就可以获得每个uid的第一行.
What I'm trying to do is a simple groupby on the uid and sorting the created by descending order so i could get the first row for each uid.
预期输出的示例
{ uid: 1, created: ISODate("2014-05-05..."), another_col: "y" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col: "a" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col: "w" }
我能得到的最好的是:
db.mycollection.aggregate( {$group: {_id: "$uid", rows: {$push: { "created" : "$created" }}}}, sort { // doesnt work well } )
任何人都可以指导我进行分组依据和排序的正确组合吗? 它只是没有按我预期的那样工作. (注意:我检查了很多线程,但是找不到适合我情况的正确答案)
Anyone can guide me for the right combination of group by and sorting? It just doesn't work as I was expecting. (note: I have checked many threads, but I'm unable to find the correct answer for my case)
推荐答案
这里有一些需要理解的地方.
There are a few catches here to understand.
当您使用 $group
将按照发现边界的顺序对它们进行排序,而无需初始阶段或结束阶段
When you use $group
the boundaries will be sorted in the order that they were discovered without either an initial or ending stage $sort
operation. So if your documents were originally in an order like this:
{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
然后仅使用 $group
没有 $sort
在管道的最后将返回以下结果:
Then just using $group
without a $sort
on the end on the pipeline would return you results like this:
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
这是一个概念,但实际上看起来您想要的结果要求您按所寻找的uid
的排序顺序返回最后其他字段".在这种情况下,获得结果的方法实际上是先 $sort
,然后再使用
That is one concept, but it actually seems like what you are expecting in results requires returning the "last other fields" by a sorted order of the uid
is what you are looking for. In that case the way to get your result is actually to $sort
first and then make use of the $last
operator:
db.mycollection.aggregate([
// Sorts everything first by _id and created
{ "$sort": { "_id": 1, "created": 1 } },
// Group with the $last results from each boundary
{ "$group": {
"_id": "$uid",
"created": { "$last": "$created" },
"another_col": { "$last": "$created" }
}}
])
或者本质上将排序应用于您想要的内容.
Or essentially apply the sort to what you want.
$last
和 $max
是后者将为分组_id
中的给定字段选择最高"值,而与当前按未排序顺序进行排序无关.另一方面, $last
将选择与最后一个"分组_id
值相同的行"中出现的值.
The difference between $last
and $max
is that the latter will choose the "highest" value for the given field within the grouping _id
, regardless of the current sorted on un-sorted order. On the other hand, $last
will choose the value that occurs in the same "row" as the "last" grouping _id
value.
如果您实际上正在寻找对数组值进行排序的方法,则此方法类似.保持数组成员处于创建"的顺序,您也将首先进行排序:
If you were actually looking to sort the values of an array then the approach is similar. Keeping the array members in "created" order you would also sort first:
db.mycollection.aggregate([
// Sorts everything first by _id and created
{ "$sort": { "_id": 1, "created": 1 } },
// Group with the $last results from each boundary
{ "$group": {
"_id": "$uid",
"row": {
"$push": {
"created": "$created",
"another_col": "$another_col"
}
}
}}
])
具有这些字段的文档将按照它们已经被排序的顺序添加到数组中.
And the documents with those fields will be added to the array with the order they were already sorted by.
这篇关于如何与分组依据汇总并正确排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!