$lookup 后的聚合过滤器 [英] Aggregation filter after $lookup

查看:28
本文介绍了$lookup 后的聚合过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 $lookup 之后添加过滤器或者是否有其他方法可以做到这一点?

How can I add a filter after an $lookup or is there any other method to do this?

我的数据收集测试是:

{ "_id" : ObjectId("570557d4094a4514fc1291d6"), "id" : 100, "value" : "0", "contain" : [ ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d7"), "id" : 110, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d8"), "id" : 120, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d9"), "id" : 121, "value" : "2", "contain" : [ 100, 120 ] }

我选择 id 100 并聚合子项:

I select id 100 and aggregate the childs:

db.test.aggregate([ {
  $match : {
    id: 100
  }
}, {
  $lookup : {
    from : "test",
    localField : "id",
    foreignField : "contain",
    as : "childs"
  }
}]);

我回来了:

{  
  "_id":ObjectId("570557d4094a4514fc1291d6"),
  "id":100,
  "value":"0",
  "contain":[ ],
  "childs":[ {  
      "_id":ObjectId("570557d4094a4514fc1291d7"),
      "id":110,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d8"),
      "id":120,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d9"),
      "id":121,
      "value":"2",
      "contain":[ 100, 120 ]
    }
  ]
}

但我只想要与值:1"匹配的孩子

But I want only childs that match with "value: 1"

最后我期待这个结果:

{  
  "_id":ObjectId("570557d4094a4514fc1291d6"),
  "id":100,
  "value":"0",
  "contain":[ ],
  "childs":[ {  
      "_id":ObjectId("570557d4094a4514fc1291d7"),
      "id":110,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d8"),
      "id":120,
      "value":"1",
      "contain":[ 100 ]
    }
  ]
}

推荐答案

这里的问题实际上是关于不同的东西,不需要 $lookup.但是对于任何纯粹从$lookup 后过滤"的标题来到这里的人来说,那么这些是适合您的技巧:

The question here is actually about something different and does not need $lookup at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:

db.test.aggregate([
    { "$match": { "id": 100 } },
    { "$lookup": {
      "from": "test",
      "let": { "id": "$id" },
      "pipeline": [
        { "$match": {
          "value": "1",
          "$expr": { "$in": [ "$$id", "$contain" ] }
        }}
      ],
      "as": "childs"
    }}
])

之前 - $lookup + $unwind + $match 合并

db.test.aggregate([
    { "$match": { "id": 100 } },
    { "$lookup": {
        "from": "test",
        "localField": "id",
        "foreignField": "contain",
        "as": "childs"
    }},
    { "$unwind": "$childs" },
    { "$match": { "childs.value": "1" } },
    { "$group": {
        "_id": "$_id",
        "id": { "$first": "$id" },
        "value": { "$first": "$value" },
        "contain": { "$first": "$contain" },
        "childs": { "$push": "$childs" }
     }}
])

如果您质疑为什么要$unwind 而不是使用 $filter 在数组上,然后读取 Aggregate $lookup 匹配管道中文档的总大小超过最大文档大小详细说明为什么这通常是必要的并且更优化.

If you question why would you $unwind as opposed to using $filter on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size for all the detail on why this is generally necessary and far more optimal.

对于 MongoDB 3.6 及更高版本的版本,则更具表现力的子管道"通常是您想要过滤"的内容在任何东西返回到数组之前的外部集合的结果.

For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.

回到答案,尽管它实际上描述了为什么提出的问题需要不加入";根本....

Back to the answer though which actually describes why the question asked needs "no join" at all....

使用 $lookup 之类的这不是最有效"的在这里做你想做的事.但稍后会详细介绍.

Using $lookup like this is not the most "efficient" way to do what you want here. But more on this later.

作为一个基本概念,只需使用 $filter 在结果数组上:

As a basic concept, just use $filter on the resulting array:

db.test.aggregate([ 
    { "$match": { "id": 100 } }, 
    { "$lookup": {
        "from": "test",
        "localField": "id",
        "foreignField": "contain",
        "as": "childs"
    }},
    { "$project": {
        "id": 1,
        "value": 1,
        "contain": 1,
        "childs": {
           "$filter": {
               "input": "$childs",
               "as": "child",
               "cond": { "$eq": [ "$$child.value", "1" ] }
           }
        }
    }}
]);

或者使用$redact相反:

Or use $redact instead:

db.test.aggregate([ 
    { "$match": { "id": 100 } }, 
    { "$lookup": {
        "from": "test",
        "localField": "id",
        "foreignField": "contain",
        "as": "childs"
    }},
    { "$redact": {
        "$cond": {
           "if": {
              "$or": [
                { "$eq": [ "$value", "0" ] },
                { "$eq": [ "$value", "1" ] }
              ]
           },
           "then": "$$DESCEND",
           "else": "$$PRUNE"
        }
    }}
]);

两者都得到相同的结果:

Both get the same result:

{  
  "_id":ObjectId("570557d4094a4514fc1291d6"),
  "id":100,
  "value":"0",
  "contain":[ ],
  "childs":[ {  
      "_id":ObjectId("570557d4094a4514fc1291d7"),
      "id":110,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d8"),
      "id":120,
      "value":"1",
      "contain":[ 100 ]
    }
  ]
}

底线是$lookup 本身不能还"查询仅选择某些数据.所以所有的过滤"需要在 $lookup 之后发生

Bottom line is that $lookup itself cannot "yet" query to only select certain data. So all "filtering" needs to happen after the $lookup

但实际上对于这种类型的自连接"你最好不要使用 $lookup 完全避免额外读取和哈希合并"的开销;完全.只需获取相关项目和 $group 代替:

But really for this type of "self join" you are better off not using $lookup at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group instead:

db.test.aggregate([
  { "$match": { 
    "$or": [
      { "id": 100 },
      { "contain.0": 100, "value": "1" }
    ]
  }},
  { "$group": {
    "_id": {
      "$cond": {
        "if": { "$eq": [ "$value", "0" ] },
        "then": "$id",
        "else": { "$arrayElemAt": [ "$contain", 0 ] }
      }
    },
    "value": { "$first": { "$literal": "0"} },
    "childs": {
      "$push": {
        "$cond": {
          "if": { "$ne": [ "$value", "0" ] },
          "then": "$$ROOT",
          "else": null
        }
      }
    }
  }},
  { "$project": {
    "value": 1,
    "childs": {
      "$filter": {
        "input": "$childs",
        "as": "child",
        "cond": { "$ne": [ "$$child", null ] }
      }
    }
  }}
])

结果只是有点不同,因为我故意删除了无关的字段.如果您真的想添加它们,请自行添加:

Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:

{
  "_id" : 100,
  "value" : "0",
  "childs" : [
    {
      "_id" : ObjectId("570557d4094a4514fc1291d7"),
      "id" : 110,
      "value" : "1",
      "contain" : [ 100 ]
    },
    {
      "_id" : ObjectId("570557d4094a4514fc1291d8"),
      "id" : 120,
      "value" : "1",
      "contain" : [ 100 ]
    }
  ]
}

所以这里唯一真正的问题是过滤";来自数组的任何 null 结果,当当前文档是 parent 在处理项目到 $push.

So the only real issue here is "filtering" any null result from the array, created when the current document was the parent in processing items to $push.

您在这里似乎还缺少的是,您要查找的结果不需要聚合或子查询";根本.您已经得出结论或可能在别处找到的结构是设计"的.这样你就可以得到一个节点"而这一切都是孩子"在单个查询请求中.

What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.

这意味着只是查询"是真正需要的,并且数据收集(这是所有正在发生的事情,因为没有真正减少"内容)只是迭代游标结果的函数:

That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:

var result = {};

db.test.find({
  "$or": [
    { "id": 100 },
    { "contain.0": 100, "value": "1" }
  ]
}).sort({ "contain.0": 1 }).forEach(function(doc) {
  if ( doc.id == 100 ) {
    result = doc;
    result.childs = []
  } else {
    result.childs.push(doc)
  }
})

printjson(result);

这完全一样:

{
  "_id" : ObjectId("570557d4094a4514fc1291d6"),
  "id" : 100,
  "value" : "0",
  "contain" : [ ],
  "childs" : [
    {
      "_id" : ObjectId("570557d4094a4514fc1291d7"),
      "id" : 110,
      "value" : "1",
      "contain" : [
              100
      ]
    },
    {
      "_id" : ObjectId("570557d4094a4514fc1291d8"),
      "id" : 120,
      "value" : "1",
      "contain" : [
              100
      ]
    }
  ]
}

并证明您在这里真正需要做的就是发出单一"查询以选择父级和子级.返回的数据是一样的,你在服务器或客户端所做的只是按摩"转换为另一种收集格式.

And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.

这是您可以赶上"的情况之一思考你如何在关系"中做事数据库,并且没有意识到由于数据的存储方式已经改变",您不再需要使用相同的方法.

This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.

这正是文档示例的重点 带有子引用的模型树结构" 在它的结构中,可以轻松地在一个查询中选择父级和子级.

That is exactly what the point of the documentation example "Model Tree Structures with Child References" in it's structure, where it makes it easy to select parents and children within one query.

这篇关于$lookup 后的聚合过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆