从 $lookup 以及整个文档中获取数组中元素的过滤计数 [英] Get filtered count of elements in array from $lookup along with the whole document

查看:10
本文介绍了从 $lookup 以及整个文档中获取数组中元素的过滤计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 MongoDB 中有这个查询:

I have this query in MongoDB:

db.emailGroup.aggregate([
    {
        "$lookup": 
        {
            "from": "link",
            "localField": "_id",
            "foreignField": "emailGroupId",
            "as": "link"
        },
    },
    {
        "$unwind": "$link"
    },
    {
        "$match": {
             'link.originalLink': ""
        }
    },
    {
        "$group" : {
            _id: '$_id',
            link: {
                $push: '$link'
            }
        }
    },
    {
        "$project": { 
            "size": { 
                "$sum": { 
                    "$map": { 
                        "input": "$link", 
                        "as": "l", 
                        "in": { 
                            "$size": {
                                "$ifNull": [
                                    "$$l.linkHistory", []
                                ]
                            }
                        } 
                    } 
                } 
            }
        }
    }
])

EmailGroup 有 partId 字段.我使用 $lookup 来加入"其他集合以求和她的字段.我需要按 partId 字段分组并为 partId 组求和自定义字段大小".这可能吗?额外问题:如何在查询结果中添加 emailGroup 字段?

EmailGroup has partId field. I use $lookup to "join" other collection for sum her fields. I need group by partId field and sum custom field "size" for partId groups. Is this possible? Extra question: How can I add emailGroup fields to query result?

示例文件:

电子邮件组:

{
    "_id" : ObjectId("594a6c47f51e075db713ccb6"),
    "partId" : "f56c7c71eb14a20e6129a667872f9c4f",
}

链接:

{
    "_id" : ObjectId("594b96d6f51e075db67c44c9"),
    "originalLink" : "",
    "emailGroupId" : ObjectId("594a6c47f51e075db713ccb6"),
    "linkHistory" : [ 
        {
            "_id" : ObjectId("594b96f5f51e075db713ccdf"),
        }, 
        {
            "_id" : ObjectId("594b971bf51e075db67c44ca"),
        }
    ]
}

推荐答案

注解给那些正在寻找的人 - Foreign Count

比最初回答更好的是实际使用 的新形式$lookup 来自 MongoDB 3.6.这实际上可以进行计数".在子管道"内表达式而不是返回一个数组";用于后续过滤和计数,甚至使用 $unwind

db.emailGroup.aggregate([
  { "$lookup": {
    "from": "link",
    "let": { "id": "$_id" },
    "pipeline": [
      { "$match": {
        "originalLink": "",
        "$expr": { "$eq": [ "$$id", "$_id" ] }
      }},
      { "$count": "count" }
    ],
    "as": "linkCount"    
  }},
  { "$addFields": {
    "linkCount": { "$sum": "$linkCount.count" }
  }}
])

不是最初的问题所要求的,而是下面以现在最优化的形式回答的一部分,当然是 $lookup 减少到匹配计数";only 而不是所有匹配的文档".

Not what the original question was asking for but part of the below answer in the now most optimal form, as of course the result of $lookup is reduced to the "matched count" only instead of "all matched documents".

正确的做法是将 "linkCount" 添加到 $group 阶段以及 $first 在父文档的任何附加字段上以获得单数"形成之前"的状态$unwind$lookup<结果的数组/a>:

The correct way to do this would be to add the "linkCount" to the $group stage as well as a $first on any additional fields of the parent document in order to get the "singular" form as was the state "before" the $unwind was processed on the array that was the result of $lookup:

所有细节

db.emailGroup.aggregate([
  { "$lookup": {
    "from": "link",
    "localField": "_id",
    "foreignField": "emailGroupId",
    "as": "link"    
  }},
  { "$unwind": "$link" },
  { "$match": { "link.originalLink": "" } },
  { "$group": {
    "_id": "$_id",
    "partId": { "$first": "$partId" },
    "link": { "$push": "$link" },
    "linkCount": {
      "$sum": {
        "$size": {
          "$ifNull": [ "$link.linkHistory", [] ]
        } 
      }   
    }
  }}
])

生产:

{
    "_id" : ObjectId("594a6c47f51e075db713ccb6"),
    "partId" : "f56c7c71eb14a20e6129a667872f9c4f",
    "link" : [ 
        {
            "_id" : ObjectId("594b96d6f51e075db67c44c9"),
            "originalLink" : "",
            "emailGroupId" : ObjectId("594a6c47f51e075db713ccb6"),
            "linkHistory" : [ 
                {
                    "_id" : ObjectId("594b96f5f51e075db713ccdf")
                }, 
                {
                    "_id" : ObjectId("594b971bf51e075db67c44ca")
                }
            ]
        }
    ],
    "linkCount" : 2
}

按部分 ID 分组

db.emailGroup.aggregate([
  { "$lookup": {
    "from": "link",
    "localField": "_id",
    "foreignField": "emailGroupId",
    "as": "link"    
  }},
  { "$unwind": "$link" },
  { "$match": { "link.originalLink": "" } },
  { "$group": {
    "_id": "$partId",
    "linkCount": {
      "$sum": {
        "$size": {
          "$ifNull": [ "$link.linkHistory", [] ]
        } 
      }   
    }
  }}
])

生产

{
    "_id" : "f56c7c71eb14a20e6129a667872f9c4f",
    "linkCount" : 2
}

使用 $unwind 这样做的原因 然后是 $match 是因为 MongoDB 在按该顺序发出时实际处理管道的方式.这就是 $lookup 如操作中的 explain" 输出所示:

    {
        "$lookup" : {
            "from" : "link",
            "as" : "link",
            "localField" : "_id",
            "foreignField" : "emailGroupId",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "originalLink" : {
                    "$eq" : ""
                }
            }
        }
    }, 
    {
        "$group" : {

我将使用 $group 在该输出中演示其他两个流水线阶段消失".这是因为它们已经被卷起来"了.进入 $lookup 管道阶段如图所示.这实际上是 MongoDB 如何处理加入"结果可能超出 BSON 限制的可能性.$lookup 的结果父文档的数组.

I'm leaving the part with $group in that output to demonstrate that the other two pipeline stages "disappear". This is because they have been "rolled-up" into the $lookup pipeline stage as shown. This is in fact how MongoDB deals with the possibility that the BSON Limit can be exceeded by the result of "joining" results of $lookup into an array of the parent document.

您可以交替编写这样的操作:

You can alternately write the operation like this:

所有细节

db.emailGroup.aggregate([
  { "$lookup": {
    "from": "link",
    "localField": "_id",
    "foreignField": "emailGroupId",
    "as": "link"    
  }},
  { "$addFields": {
    "link": {
      "$filter": {
        "input": "$link",
        "as": "l",
        "cond": { "$eq": [ "$$l.originalLink", "" ] }    
      }
    },
    "linkCount": {
      "$sum": {
        "$map": {
          "input": {
            "$filter": {
              "input": "$link",
              "as": "l",
              "cond": { "$eq": [ "$$l.originalLink", "" ] }
            }
          },
          "as": "l",
          "in": { "$size": { "$ifNull": [ "$$l.linkHistory", [] ] } }
        }     
      }
    }    
  }}
])

按部件 ID 分组

db.emailGroup.aggregate([
  { "$lookup": {
    "from": "link",
    "localField": "_id",
    "foreignField": "emailGroupId",
    "as": "link"    
  }},
  { "$addFields": {
    "link": {
      "$filter": {
        "input": "$link",
        "as": "l",
        "cond": { "$eq": [ "$$l.originalLink", "" ] }    
      }
    },
    "linkCount": {
      "$sum": {
        "$map": {
          "input": {
            "$filter": {
              "input": "$link",
              "as": "l",
              "cond": { "$eq": [ "$$l.originalLink", "" ] }
            }
          },
          "as": "l",
          "in": { "$size": { "$ifNull": [ "$$l.linkHistory", [] ] } }
        }     
      }
    }    
  }},
  { "$unwind": "$link" },
  { "$group": {
    "_id": "$partId",
    "linkCount": { "$sum": "$linkCount" } 
  }}
])

具有相同输出但不同"的输出从第一个查询开始, $filter 这里是在之后"应用的.$lookup所有结果/code> 被返回到父文档的新数组中.

Which has the same output but "differs" from the first query in that the $filter here is applied "after" ALL results of the $lookup are returned into the new array of the parent document.

所以在性能方面,实际上更有效的是第一种方式,并且可以在过滤之前"移植到可能的大型结果集.否则会打破 16MB BSON 限制.

So in performance terms, it is actually more effective to do it the first way as well as being portable to possible large result sets "before filtering" which would otherwise break the 16MB BSON Limit.

注意:如果您没有 $addFields 可用(随 MongoDB 3.4 添加)然后使用 $project 并指定ALL";您希望返回的字段.

Note: If you do not have $addFields available (added with MongoDB 3.4) then use $project and specify "ALL" of the fields you wish to return.


作为对那些感兴趣的人的旁注,在 MongoDB 的未来版本(大概 3.6 及更高版本)中,您可以使用 $replaceRoot 而不是 $addFields 使用新的 $mergeObjects 管道运算符.这样做的好处是作为块",我们可以通过 内容声明为变量/operator/aggregation/let/" rel="nofollow noreferrer">$let,这意味着你不需要写同样的 $filter 两次":


As a side-note for those who are interested, in future releases of MongoDB ( presumably 3.6 and up ) you can use $replaceRoot instead of an $addFields with usage of the new $mergeObjects pipeline operator. The advantage of this is as a "block", we can declare the "filtered" content as a variable via $let, which means you do not need to write the same $filter "twice":

db.emailGroup.aggregate([
  { "$lookup": {
    "from": "link",
    "localField": "_id",
    "foreignField": "emailGroupId",
    "as": "link"    
  }},
  { "$replaceRoot": {
    "newRoot": {
      "$mergeObjects": [
        "$$ROOT",
        { "$let": {
          "vars": {
            "filtered": {
              "$filter": {
                "input": "$link",
                "as": "l",
                "cond": { "$eq": [ "$$l.originalLink", "" ] }    
              }
            }
          },
          "in": {
            "link": "$$filtered",
            "linkCount": {
              "$sum": {
                "$map": {
                  "input": "$$filtered.linkHistory",
                  "as": "lh",
                  "in": { "$size": { "$ifNull": [ "$$lh", [] ] } } 
                }   
              } 
            }  
          }
        }}
      ]
    }
  }}
])

尽管如此,进行这种过滤"的最佳方式是$lookup 操作是 "仍然"此时使用 $unwind 然后是 $match 模式,直到您可以向 $lookup 直接.

Nonetheless the best way to do such "filtered" $lookup operations is "still" at this time using the $unwind then $match pattern, until such time as you can provide query arguments to $lookup directly.

注意:通过直接";这里我不是指不相关的";$lookup 的形式应该也在 MongoDB 3.6 版本中,因为这确实会发布另一个管道".执行父集合中的每个文档.因此,该功能仍然无法取代目前的最有效".仅检索匹配项的方式.

Note: By "directly" here I do not mean the "non-correlated" form of $lookup which should also be in the MongoDB 3.6 release, as this would indeed issue another "pipeline" execution for each document in the parent collection. So that feature still does not replace the present "best effective" way of retrieving only the matched items.

这篇关于从 $lookup 以及整个文档中获取数组中元素的过滤计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆