$ lookup后的聚合过滤器 [英] Aggregation filter after $lookup

查看:149
本文介绍了$ lookup后的聚合过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在$ lookup之后添加过滤器,还是有其他方法可以做到这一点?

How can I add a filter after an $lookup or is there any other method to do this?

我的数据收集测试是:

{ "_id" : ObjectId("570557d4094a4514fc1291d6"), "id" : 100, "value" : "0", "contain" : [ ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d7"), "id" : 110, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d8"), "id" : 120, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d9"), "id" : 121, "value" : "2", "contain" : [ 100, 120 ] }

我选择id 100汇总孩子:

I select id 100 and aggregate the childs:

db.test.aggregate([ {
  $match : {
    id: 100
  }
}, {
  $lookup : {
    from : "test",
    localField : "id",
    foreignField : "contain",
    as : "childs"
  }
}]);

我回来了:

{  
  "_id":ObjectId("570557d4094a4514fc1291d6"),
  "id":100,
  "value":"0",
  "contain":[ ],
  "childs":[ {  
      "_id":ObjectId("570557d4094a4514fc1291d7"),
      "id":110,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d8"),
      "id":120,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d9"),
      "id":121,
      "value":"2",
      "contain":[ 100, 120 ]
    }
  ]
}

但我只想要与value:1匹配的孩子

But I want only childs that match with "value: 1"

最后我期待这个结果:

{  
  "_id":ObjectId("570557d4094a4514fc1291d6"),
  "id":100,
  "value":"0",
  "contain":[ ],
  "childs":[ {  
      "_id":ObjectId("570557d4094a4514fc1291d7"),
      "id":110,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d8"),
      "id":120,
      "value":"1",
      "contain":[ 100 ]
    }
  ]
}


推荐答案

这里的问题实际上是关于不同的东西,不需要 $ lookup 。但是对于那些纯粹来自在$ lookup之后过滤的标题来到这里的人来说,这些是适合你的技术:

The question here is actually about something different and does not need $lookup at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:

db.test.aggregate([
    { "$match": { "id": 100 } },
    { "$lookup": {
      "from": "test",
      "let": { "id": "$id" },
      "pipeline": [
        { "$match": {
          "value": "1",
          "$expr": { "$in": [ "$$id", "$contain" ] }
        }}
      ],
      "as": "childs"
    }}
])



早期 - $ lookup + $ unwind + $ match coalescence



Earlier - $lookup + $unwind + $match coalescence

db.test.aggregate([
    { "$match": { "id": 100 } },
    { "$lookup": {
        "from": "test",
        "localField": "id",
        "foreignField": "contain",
        "as": "childs"
    }},
    { "$unwind": "$childs" },
    { "$match": { "childs.value": "1" } },
    { "$group": {
        "_id": "$_id",
        "id": { "$first": "$id" },
        "value": { "$first": "$value" },
        "contain": { "$first": "$contain" },
        "childs": { "$push": "$childs" }
     }}
])

如果你质疑为什么要 $ unwind ,而不是使用 $ filter ,然后读取汇总$ lookup匹配管道中文档的总大小超过了最大文档大小,以获取有关为何这通常是必要且更加优化的详细信息。

If you question why would you $unwind as opposed to using $filter on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size for all the detail on why this is generally necessary and far more optimal.

对于MongoDB 3.6及以后的版本,那么更具表现力的子管道通常是你想要在任何事情之前过滤外国集合的结果返回到数组中。

For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.

回到答案,虽然这实际上描述了为什么问题需要不加入....

Back to the answer though which actually describes why the question asked needs "no join" at all....

使用 $ lookup 喜欢这个在这里做你想做的事情并不是最有效的方式。但稍后会有更多内容。

Using $lookup like this is not the most "efficient" way to do what you want here. But more on this later.

作为一个基本概念,只需使用 $ filter

As a basic concept, just use $filter on the resulting array:

db.test.aggregate([ 
    { "$match": { "id": 100 } }, 
    { "$lookup": {
        "from": "test",
        "localField": "id",
        "foreignField": "contain",
        "as": "childs"
    }},
    { "$project": {
        "id": 1,
        "value": 1,
        "contain": 1,
        "childs": {
           "$filter": {
               "input": "$childs",
               "as": "child",
               "cond": { "$eq": [ "$$child.value", "1" ] }
           }
        }
    }}
]);

或使用 $ redact

db.test.aggregate([ 
    { "$match": { "id": 100 } }, 
    { "$lookup": {
        "from": "test",
        "localField": "id",
        "foreignField": "contain",
        "as": "childs"
    }},
    { "$redact": {
        "$cond": {
           "if": {
              "$or": [
                { "$eq": [ "$value", "0" ] },
                { "$eq": [ "$value", "1" ] }
              ]
           },
           "then": "$$DESCEND",
           "else": "$$PRUNE"
        }
    }}
]);

两者都得到相同的结果:

Both get the same result:

{  
  "_id":ObjectId("570557d4094a4514fc1291d6"),
  "id":100,
  "value":"0",
  "contain":[ ],
  "childs":[ {  
      "_id":ObjectId("570557d4094a4514fc1291d7"),
      "id":110,
      "value":"1",
      "contain":[ 100 ]
    },
    {  
      "_id":ObjectId("570557d4094a4514fc1291d8"),
      "id":120,
      "value":"1",
      "contain":[ 100 ]
    }
  ]
}

底线是 $ lookup 本身无法查询以仅选择某些数据。因此,所有过滤都需要在 $ lookup之后进行

但是真的对于这种类型的自我加入你最好不要使用 $ lookup 可以避免开销另外一个读取和哈希合并完全。只需获取相关项目并 $ group 而不是:

But really for this type of "self join" you are better off not using $lookup at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group instead:

db.test.aggregate([
  { "$match": { 
    "$or": [
      { "id": 100 },
      { "contain.0": 100, "value": "1" }
    ]
  }},
  { "$group": {
    "_id": {
      "$cond": {
        "if": { "$eq": [ "$value", "0" ] },
        "then": "$id",
        "else": { "$arrayElemAt": [ "$contain", 0 ] }
      }
    },
    "value": { "$first": { "$literal": "0"} },
    "childs": {
      "$push": {
        "$cond": {
          "if": { "$ne": [ "$value", "0" ] },
          "then": "$$ROOT",
          "else": null
        }
      }
    }
  }},
  { "$project": {
    "value": 1,
    "childs": {
      "$filter": {
        "input": "$childs",
        "as": "child",
        "cond": { "$ne": [ "$$child", null ] }
      }
    }
  }}
])

由于我故意,这只会有点不同删除了无关的字段。如果你真的想要自己添加它们:

Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:

{
  "_id" : 100,
  "value" : "0",
  "childs" : [
    {
      "_id" : ObjectId("570557d4094a4514fc1291d7"),
      "id" : 110,
      "value" : "1",
      "contain" : [ 100 ]
    },
    {
      "_id" : ObjectId("570557d4094a4514fc1291d8"),
      "id" : 120,
      "value" : "1",
      "contain" : [ 100 ]
    }
  ]
}

所以这里唯一真正的问题是过滤任何 null 数组的结果,当当前文档是时处理项目 $ push

So the only real issue here is "filtering" any null result from the array, created when the current document was the parent in processing items to $push.

您在这里看起来似乎也缺少的是您要查找的结果根本不需要聚合或子查询。您在其他地方得出或可能找到的结构是设计的,这样您就可以在单个查询请求中获得节点及其所有子节点。

What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.

这意味着只需要查询就是所有真正需要的东西,而数据集合(由于没有内容真正被减少而发生的一切)只是迭代游标结果的函数:

That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:

var result = {};

db.test.find({
  "$or": [
    { "id": 100 },
    { "contain.0": 100, "value": "1" }
  ]
}).sort({ "contain.0": 1 }).forEach(function(doc) {
  if ( doc.id == 100 ) {
    result = doc;
    result.childs = []
  } else {
    result.childs.push(doc)
  }
})

printjson(result);

这完全相同:

{
  "_id" : ObjectId("570557d4094a4514fc1291d6"),
  "id" : 100,
  "value" : "0",
  "contain" : [ ],
  "childs" : [
    {
      "_id" : ObjectId("570557d4094a4514fc1291d7"),
      "id" : 110,
      "value" : "1",
      "contain" : [
              100
      ]
    },
    {
      "_id" : ObjectId("570557d4094a4514fc1291d8"),
      "id" : 120,
      "value" : "1",
      "contain" : [
              100
      ]
    }
  ]
}

并作为证明你在这里真正需要做的就是发出单一查询来选择父和子。返回的数据是一样的,你在服务器或客户端上所做的只是按摩到另一种收集的格式。

And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.

这是你遇到的情况之一你可以思考如何在关系型数据库中做事,并且没有意识到由于数据的存储方式已经改变,你不再需要使用相同的方法。

This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.

这正是文档示例的重点 带有子引用的模型树结构 在其结构中,可以轻松选择父项和子项查询。

That is exactly what the point of the documentation example "Model Tree Structures with Child References" in it's structure, where it makes it easy to select parents and children within one query.

这篇关于$ lookup后的聚合过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆