在没有elemMatch的情况下使用数组过滤有条件地进行聚合投影? [英] Conditionally project in aggregation with array filtering without elemMatch?

查看:63
本文介绍了在没有elemMatch的情况下使用数组过滤有条件地进行聚合投影?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

事实证明$ project不支持$ elemMatch聚合。在3.2中,他们引入了filter等,似乎并不能解决我的问题。

It turns out $project does not support $elemMatch in aggregation. In 3.2, they introduced filter etc, which doesn't seem to solve my problem.

让我解释一下我要做什么,假设我在数据库中有以下文档。

Let me explain what I'm trying to do, suppose I've the following documents in the database.

db.test.insert(
{
  "ad_account_id": 150,
  "internal_id": 1,
  "daily": [{
    "timestamp": "2016-12-01",
    "impressions": 5
  }, {
    "timestamp": "2016-12-06",
    "impressions": 7
  }]
})

db.test.insert(
{
  "ad_account_id": 150,
  "internal_id": 2,
  "daily": [{
    "timestamp": "2016-12-03",
    "impressions": 6
  }] 
})

db.test.insert({
  "ad_account_id": 150,
  "internal_id": 3,
  "daily": []
})


db.test.insert({
  "ad_account_id": 16,
  "internal_id": 3,
  "daily": []
})

现在假设用户查询ad_account_id: 150,并按开始日期和结束日期范围从 2016-12-01到 2016-12-02进行过滤。

Now suppose a user queries for ad_account_id: 150, and filters by start and end date range as "2016-12-01" to "2016-12-02".

我的聚合查询已读取像这样(跳过的排序,限制等)

My aggregation query reads like this (skipped sort, limit etc)

db.getCollection('test').aggregate({
        "$match" : {
          "ad_account_id" : 150,
          "daily" : {
            "$elemMatch" : {
              "timestamp" : {
                "$lte" : "2016-12-02",
                "$gte" : "2016-12-01"
              }
            }
          }
        }
      },
      {
        "$unwind" : "$daily"
      },
      {
        "$match" : {
          "daily.timestamp" : {
            "$lte" : "2016-12-02",
            "$gte" : "2016-12-01"
          }
        }
      },
      {
        "$group" : {
          "impressions" : {
            "$sum" : "$daily.impressions"
          },
          "ad_account_id" : {
            "$first" : "$ad_account_id"
          },
          "_id" : "$internal_id"
        }
      },
      {
        "$project" : {
          "impressions" : 1,
          "ga_transactions" : 1,
          "ad_account_id" : 1
        }
      }
);

当前结果

{ "_id" : 1, "impressions" : 5, "ad_account_id" : 150 }

在我们当地的发展中,最初似乎还不错。即使有一百万个文档,查询速度也很快,我们很高兴。

In our local development, it initially seemed okay. The query was fast even with a million documents, and we were happy.

但是我们很快意识到了用例,即使每天的数据不在开始日期和结束日期之间,我们也需要显示行。印象中的印象等可以用0表示,但必须确定显示。

But we soon realized our use case, where we needed to show rows even if daily data was not between the start date and end date. Impressions etc in it could be represented by 0 but they had to be shown for sure.

所以我们想要的期望结果

 { "_id" : 1, "impressions" : 5, "ad_account_id" : 150 }
 { "_id" : 2, "impressions" : 0, "ad_account_id" : 150 }
 { "_id" : 3, "impressions" : 0, "ad_account_id" : 150 }

在过去的几个小时中,我一直在为此苦苦挣扎,因为我似乎无法在单个mongo查询中得到它。我以为我只能将匹配项限制为广告帐户ID,然后执行$ project,如果该数据范围之间没有数据,我将每天添加一个示例条目,并以开始数据作为时间戳,例如。

And I have been struggling with this for the last few hours as I can't seem to get this in a single mongo query. I thought I would limit my match to just ad account id, and then do a $project, and if no data is there between that data range, I would just add a sample entry to daily with the start data as the timestamp something like this.

{
  "ad_account_id": 150,
  "internal_id": 3,
  "daily": [{timestamp: "2016-02-01"}]
)

但是不幸的是,我无法使用它,因为在$ project中无法执行$ elemMatch。 $ filter等新内容似乎无法解决我的问题。

But unfortunately I can't get this to work, as within $project you can't get do $elemMatch. The new things like $filter etc don't seem to solve my problem.

我也尝试过工会,我认为工会也差不多。但这给了我一个错误 FieldPath'2016-12-01'并非以$开头。

I also tried union, and I think its almost there as well. But this gave me an error ""FieldPath '2016-12-01' doesn't start with $".

您认为这样做的最好方法是什么?

What do you think is the best way to do this?

推荐答案

好吧,花了几个小时在上面,并有一个尤里卡的时刻。事实证明,我离那儿并不远

Okay, spent literally hours on this, and had an eureka moment. Turns out I wasn't too far from the solution.

db.getCollection('test').aggregate(
    {
        "$match" : {
          "ad_account_id" : 150
        }
      },
      { "$project": {
        "ad_account_id": 1,
        "internal_id": 1,
        "daily": {
            "$setUnion": [
                { "$map": {
                    "input": "$daily",
                    "as": "day",
                    "in": {
                        "$cond": [
                            { "$and": [
                                { "$gte": [ "$$day.timestamp", "2016-12-01" ] },
                                { "$lte": [ "$$day.timestamp", "2016-12-02" ] }
                            ]},
                            "$$day",
                            false
                        ]
                    }
                }},
                [{"$literal": {"timestamp": "2016-12-01" } }]
            ]
        }
      }},
      {
        "$unwind" : "$daily"
      },
      {
        "$group" : {
          "impressions" : {
            "$sum" : "$daily.impressions"
          },
          "ad_account_id" : {
...          "$first" : "$ad_account_id"
...        },
          "_id" : "$internal_id"
        }
      },
      {
        "$project" : {
          "impressions" : 1,
          "ad_account_id" : 1
        }
      }
);

对于希望以此为主题的人们,我添加了 daily_mod:{ $ addToSet: $ daily} 到最后的$ group阶段,并将其添加到最后一个项目 daily_mod:1

For people looking at this for ideas, I added "daily_mod": { $addToSet: "$daily" } to the last $group stage, and added this to the last project "daily_mod": 1.

这确实可以帮助您了解发生了什么,并且输出为-:

This will really help you understand what happened and gives an output of -:

{ "_id" : 3, "impressions" : 0, "ad_account_id" : 150, "daily_mod" : [ { "timestamp" : "2016-12-01" } ] }
{ "_id" : 2, "impressions" : 0, "ad_account_id" : 150, "daily_mod" : [ false, { "timestamp" : "2016-12-01" } ] }
{ "_id" : 1, "impressions" : 5, "ad_account_id" : 150, "daily_mod" : [ { "timestamp" : "2016-12-01", "impressions" : 5 }, false, { "timestamp" : "2016-12-01" } ] }

如果有人可以在性能方面给我更好的答案,将很高兴地将其标记为正确的回答。

If someone can give me a better answer in regards to performance, will happily mark that as the right answer.

这篇关于在没有elemMatch的情况下使用数组过滤有条件地进行聚合投影?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆