Mongodb中的最大和分组依据 [英] Max and group by in Mongodb

查看:57
本文介绍了Mongodb中的最大和分组依据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我们只是从SQL Server迁移到Mongodb.我有一个包含字段 TFN,展示次数的集合.我需要在mongo中转换sql查询,但此刻被卡住了.

First of all we are just migrating from SQL Server to Mongodb. I have a collection containing fields TFN, Impressions. I need to transform the sql query in mongo but got stuck at a moment.

场景是我需要根据tfns

Scenario is I need to select a top 5 impressions from the collection which are group by on the basis of tfns

Select Top 5 a.TFN, a.MaxImpression as MaxCount from ( 
  Select TFN, Max(Impressions) MaxImpression 
  from tblData 
  Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
  Group by TFN 
  ) a

这是Sql Server中的查询.我需要通过使用mongodb来实现相同的方案.到目前为止,我已经完成了mongo的聚合和分组功能,但是无法实现与sql相同的输出.

This is the query in Sql Server. I need to achieve the same scenario by using mongodb. So far i have gone through the aggregate and group functions of mongo but couldn't able to achieve the same output as by sql.

注意:我只是无法在MongoDb中的Max子句和Group by之间建立链接

这是我尝试过的实现方式

Here is the implementation which i have tried

db.getCollection("_core.data").aggregate([
       { 
           $match: 
           {
               $and: [
                   {
                       "TFN": 
                       {
                           $in: tfns 

                       }

                   } ,
                   { 
                       "TrendDate": 
                       {
                           $gte : 20170421,
                           $lte: 20170421

                       }
                   }]
           }
        }, 
        {
            $group: 
            {
               _id:"Impressions", 
               Impression: {
                   $max : "$Impressions"
               }
            }  
        }
    ])

第二次尝试

db.getCollection("_core.adwordsPull.static").group({
    key: { TFN: 1,  Impressions: 1 },
    cond: { TFN:  {
                               $in: tfns 

                           },
                       { 
                           "TrendDate": 
                           {
                               $gte : 20170421,
                               $lte: 20170421

                           }
                       } },
    reduce: function( curr, result ) {

                result.total += curr.Impression;
             },
    initial: { total : 0 }
})

该方法有什么问题,我该如何纠正?

what is wrong with the approach and how could i correct them?

示例数据

TFN Impression  TrendDate
84251456    12  20170424
84251456    15  20170424
84251456    18  20170424
84251456    19  20170424
84251456    22  20170424
84251456    23  20170423
84251456    24  20170423

84251455    25  20170423
84251455    30  20170423
84251455    35  20170424
84251455    24  20170423
84251455    22  20170423
84251455    21  20170424
84251455    22  20170424

预期输出:

TFN  MaxCount
84251456    22
84251455    35

推荐答案

要获得所需的结果,请先分解以子查询开头的SQL查询:

To achieve the desired result, start by breaking down the SQL query beginning with the sub query:

Select *
from tblData 
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate

等效的mongo查询如下:

The equivalent mongo query follows:

db.getCollection("_core.data").aggregate([
    {
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    }
])


$ group 的等价物

Select TFN, Max(Impressions) MaxImpression 
from tblData 
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
Group by TFN 

关注

db.getCollection("_core.data").aggregate([
    {
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    },
    {
        "$group": {
            "_id": "$TFN",
            "MaxImpression": { "$max": "$Impression" }
        }
    }
])


前5个查询


The top 5 query

Select Top 5 a.TFN, a.MaxImpression as MaxCount from ( 
    Select TFN, Max(Impressions) MaxImpression 
    from tblData 
    Where TFN in (Select TFN From @tmpTFNList) 
        and TrendDate between @StartDate AND @EndDate
    Group by TFN 
) a

通过 $ limit 运算符,并通过

is made possible with the $limit operator and the fields selection through the $project stage as

db.getCollection("_core.data").aggregate([
    { /* WHERE TFN in list AND TrendDate between DATES */
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    },
    { /* GROUP BY TFN */
        "$group": {
            "_id": "$TFN",
            "MaxImpression": { "$max": "$Impression" }
        }
    },
    { "$limit": 5 }, /* TOP 5 */
    { /* SELECT a.MaxImpression as MaxCount */
        "$project": {
            "TFN": "$_id",
            "_id": 0,
            "MaxCount": "$MaxImpression"
        }
    }
])


更新

要通过此编辑中的示例获得所需的结果,您需要额外的 $ sort $ group ,您可以在其中按 TrendDate Impression 字段对文档进行排序,二者均以降序排列.


UPDATE

To get the desired result from the sample in this edit, you need an extra $sort pipeline before the $group where your sort the documents by the TrendDate and Impression fields, both in descending order.

然后,您将必须使用

You will then have to use the $first accumulator operator within the $group pipeline stage to get the maximum impression as you will have an ordered stream of documents in your pipeline.

考虑将修订后的汇总操作运行为:

Consider running the revised aggregate operation as:

db.getCollection('collection').aggregate([
    { 
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    },
    { "$sort": { "TrendDate": -1, "Impression": -1 } },
    {  
        "$group": {
            "_id": "$TFN",
            "MaxImpression": { "$first": "$Impression" }
        }
    },
    { "$limit": 5 }, 
    {   
        "$project": {
            "TFN": "$_id",
            "_id": 0,
            "MaxCount": "$MaxImpression"
        }
    }
])

示例输出

/* 1 */
{
    "TFN" : 84251456,
    "MaxCount" : 22
}

/* 2 */
{
    "TFN" : 84251455,
    "MaxCount" : 35
}

这篇关于Mongodb中的最大和分组依据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆