Mongodb中的最大和分组依据 [英] Max and group by in Mongodb
问题描述
首先,我们只是从SQL Server迁移到Mongodb.我有一个包含字段 TFN,展示次数
的集合.我需要在mongo中转换sql查询,但此刻被卡住了.
First of all we are just migrating from SQL Server to Mongodb.
I have a collection containing fields TFN, Impressions
. I need to transform the sql query in mongo but got stuck at a moment.
场景是我需要根据tfns
Scenario is I need to select a top 5 impressions
from the collection which are group by on the basis of tfns
Select Top 5 a.TFN, a.MaxImpression as MaxCount from (
Select TFN, Max(Impressions) MaxImpression
from tblData
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
Group by TFN
) a
这是Sql Server中的查询.我需要通过使用mongodb来实现相同的方案.到目前为止,我已经完成了mongo的聚合和分组功能,但是无法实现与sql相同的输出.
This is the query in Sql Server. I need to achieve the same scenario by using mongodb. So far i have gone through the aggregate and group functions of mongo but couldn't able to achieve the same output as by sql.
注意:我只是无法在MongoDb中的Max子句和Group by之间建立链接
这是我尝试过的实现方式
Here is the implementation which i have tried
db.getCollection("_core.data").aggregate([
{
$match:
{
$and: [
{
"TFN":
{
$in: tfns
}
} ,
{
"TrendDate":
{
$gte : 20170421,
$lte: 20170421
}
}]
}
},
{
$group:
{
_id:"Impressions",
Impression: {
$max : "$Impressions"
}
}
}
])
第二次尝试
db.getCollection("_core.adwordsPull.static").group({
key: { TFN: 1, Impressions: 1 },
cond: { TFN: {
$in: tfns
},
{
"TrendDate":
{
$gte : 20170421,
$lte: 20170421
}
} },
reduce: function( curr, result ) {
result.total += curr.Impression;
},
initial: { total : 0 }
})
该方法有什么问题,我该如何纠正?
what is wrong with the approach and how could i correct them?
示例数据
TFN Impression TrendDate
84251456 12 20170424
84251456 15 20170424
84251456 18 20170424
84251456 19 20170424
84251456 22 20170424
84251456 23 20170423
84251456 24 20170423
84251455 25 20170423
84251455 30 20170423
84251455 35 20170424
84251455 24 20170423
84251455 22 20170423
84251455 21 20170424
84251455 22 20170424
预期输出:
TFN MaxCount
84251456 22
84251455 35
推荐答案
要获得所需的结果,请先分解以子查询开头的SQL查询:
To achieve the desired result, start by breaking down the SQL query beginning with the sub query:
Select *
from tblData
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
等效的mongo查询如下:
The equivalent mongo query follows:
db.getCollection("_core.data").aggregate([
{
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
}
])
$ group
的等价物
Select TFN, Max(Impressions) MaxImpression
from tblData
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
Group by TFN
关注
db.getCollection("_core.data").aggregate([
{
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
},
{
"$group": {
"_id": "$TFN",
"MaxImpression": { "$max": "$Impression" }
}
}
])
前5个查询
The top 5 query
Select Top 5 a.TFN, a.MaxImpression as MaxCount from (
Select TFN, Max(Impressions) MaxImpression
from tblData
Where TFN in (Select TFN From @tmpTFNList)
and TrendDate between @StartDate AND @EndDate
Group by TFN
) a
通过 $ limit
运算符,并通过
is made possible with the $limit
operator and the fields selection through the $project
stage as
db.getCollection("_core.data").aggregate([
{ /* WHERE TFN in list AND TrendDate between DATES */
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
},
{ /* GROUP BY TFN */
"$group": {
"_id": "$TFN",
"MaxImpression": { "$max": "$Impression" }
}
},
{ "$limit": 5 }, /* TOP 5 */
{ /* SELECT a.MaxImpression as MaxCount */
"$project": {
"TFN": "$_id",
"_id": 0,
"MaxCount": "$MaxImpression"
}
}
])
更新
要通过此编辑中的示例获得所需的结果,您需要额外的 $ sort
$ group
,您可以在其中按 TrendDate
和 Impression
字段对文档进行排序,二者均以降序排列.
UPDATE
To get the desired result from the sample in this edit, you need an extra $sort
pipeline before the $group
where your sort the documents by the TrendDate
and Impression
fields, both in descending order.
You will then have to use the $first
accumulator operator within the $group
pipeline stage to get the maximum impression as you will have an ordered stream of documents in your pipeline.
考虑将修订后的汇总操作运行为:
Consider running the revised aggregate operation as:
db.getCollection('collection').aggregate([
{
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
},
{ "$sort": { "TrendDate": -1, "Impression": -1 } },
{
"$group": {
"_id": "$TFN",
"MaxImpression": { "$first": "$Impression" }
}
},
{ "$limit": 5 },
{
"$project": {
"TFN": "$_id",
"_id": 0,
"MaxCount": "$MaxImpression"
}
}
])
示例输出
/* 1 */
{
"TFN" : 84251456,
"MaxCount" : 22
}
/* 2 */
{
"TFN" : 84251455,
"MaxCount" : 35
}
这篇关于Mongodb中的最大和分组依据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!