如何加入两个有条件的附加集合 [英] How to join to two additional collections with conditions

查看:12
本文介绍了如何加入两个有条件的附加集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从 tb1,tb2,tb3 中选择 tb1.*,tb3其中 tb1.id=tb2.profile_id 和 tb2.field='<text>'和 tb3.user_id = tb2.id 和 tb3.status =0

实际上我将 sql 转换为 mongo sql 如下

mongo我使用的sql

db.getCollection('tb1').aggregate([{ $查找:{来自:'tb2',本地字段:'id',foreignField: 'profile_id',如:'tb2detail'}},{ $查找:{来自:'tb3',localField: 'tb2.id',foreignField: 'user_id',如:'tb3details'}},{ $匹配:{ '地位':{ '$ne': '关闭'},'tb2.profile_type': '代理','tb3.status':0}}])

但没有达到预期的结果..

任何帮助将不胜感激..

解决方案

你在这里缺少的是 $lookup 在其参数中由 as 指定的输出字段中生成一个数组".这是 MongoDB 关系"的一般概念,因为文档之间的关系"表示为文档本身内部"的子属性",对于许多人来说,要么是单数,要么是数组".

由于 MongoDB 是无模式"的,$ 的一般假设查找 是你的意思是很多",因此结果是总是"一个数组.因此,寻找与 SQL 相同的结果"然后您需要 $unwind $ 之后的那个数组查找.无论是一"还是多"都无关紧要,因为它仍然始终"是一个数组:

db.getCollection.('tb1').aggregate([//从源集合中过滤条件{$match":{状态":{$ne":关闭"}}},//进行第一次连接{ "$查找": {来自":tb2","localField": "id","foreignField": "profileId",作为":tb2"}},//$unwind 数组去规范化{ "$unwind": "$tb2" },//然后匹配 tb2 的条件{ "$match": { "tb2.profile_type": "agent" } },//加入第二个附加集合{ "$查找": {来自":tb3","localField": "tb2.id","foreignField": "id",作为":tb3"}},//$unwind 再次去规范化{ "$unwind": "$tb3" },//现在过滤tb3上的条件{ "$match": { "tb3.status": 0 } },//项目只需要字段.在这种情况下,排除tb2"{ "$project": { "tb2": 0 } }])

这里您需要注意翻译中缺少的其他内容:

顺序很重要"

聚合管道比 SQL 更简洁地表达".实际上,最好将它们视为一系列步骤",应用于数据源以整理和转换数据.最好的模拟是管道"命令行指令,例如:

ps -ef |grep mongod |grep -v grep |awk '{ 打印 $1 }'

管道"| 可以被视为 MongoDB 聚合管道"中的管道阶段".

因此,我们希望 $match 为了从源"集合中过滤东西作为我们的第一个操作.这通常是一种很好的做法,因为它会从进一步的条件中删除任何不符合所需条件的文件.就像我们的命令行管道"示例中发生的那样,我们将输入"然后管道"到 grep 以删除"或过滤".

路径很重要

您接下来要做的就是通过 $lookup.结果是来自 "from" 集合参数的项目的数组",这些参数与提供的字段匹配,以在 "as" "field path" 中作为 "数组".

此处选择的命名很重要,因为现在源集合中的文档"认为连接"中的所有项目现在都存在于该给定路径中.为方便起见,我使用与新路径"的连接"相同的集合"名称.

因此,从第一个加入"开始,输出到 "tb2",它将保存该集合的所有结果.下面的 $unwind 序列还有一点需要注意 然后是 $match,关于 MongoDB 实际如何处理查询.

某些序列真的"很重要

因为它看起来"有三个"管道阶段,分别是 $lookup 然后 $unwind 然后 $match.但事实上,MongoDB 确实做了其他事情,这在 { "explain": true } 添加到 .aggregate() 命令的输出中得到了证明:p>

<代码> {$查找":{来自":tb2",作为":tb2",localField":id","foreignField" : "profileId",展开":{preserveNullAndEmptyArrays":假},匹配":{profile_type":{$eq":代理"}}}},{$查找":{来自":tb3",作为":tb3",localField":tb2.id","foreignField" : "id",展开":{preserveNullAndEmptyArrays":假},匹配":{地位" : {$eq":0.0}}}},

因此,除了序列"应用的第一点之外,您需要将 $match 语句在需要它们的地方做最好的",这对于连接"的概念实际上变得非常重要".这里要注意的是我们的 $lookup 然后 $unwind 然后是 $match,实际上被 MongoDB 处理为 $lookup 阶段,其他操作汇总"到每个阶段的一个管道阶段.

这是与其他过滤"由 $lookup.因为在这种情况下,加入"的实际查询"条件来自 $match 在集合上执行加入之前",结果返回给父级.

结合 $unwind (被翻译成 unwinding )如上所示是 MongoDB 如何实际处理join"可能导致在源文档中生成内容数组的可能性,从而导致其超出16MB BSON 限制.这只会发生在被连接的结果非常大的情况下,但同样的优势在于实际应用过滤器"的地方,即在返回结果之前"的目标集合上.

正是这种处理与 SQL JOIN 的相同行为最相关".因此,它也是从 $lookup 获取结果的最有效方式 除了简单的外来"键值的本地"之外,还有其他条件适用于 JOIN.

另请注意,另一个行为变化是由 $lookup 无论目标"集合中是否存在匹配文档,都将始终保留源"文档.相反, $unwind 添加到此通过 $match.

<块引用>

事实上,由于包含了隐含的 preserveNullAndEmptyArrays: false,它们甚至会被预先丢弃,并且会丢弃本地"和外来"键在两个集合之间甚至不匹配的任何内容.这对于这种特定类型的查询来说是一件好事,因为连接"旨在对这些值进行相等".

结束

如前所述,MongoDB 对待关系"的方式通常与使用关系数据库"或 RDBMS 的方式大不相同.关系"的一般概念实际上是将数据嵌入"为单个属性或数组.

你可能真的想要这样的输出,这也是没有 $unwind 在此处对 $lookup 实际上是一个数组".但是在这种情况下使用 $unwind实际上是最有效的做法,并且保证加入"的数据实际上不会因为加入"而导致上述 BSON 限制被超过.

如果你真的想要输出数组,那么最好的办法是使用 $group 管道阶段,并且可能作为多个阶段来规范化"和撤消结果"$unwind

 { "$group": {"_id": "$_id","tb1_field": { "$first": "$tb1_field" },tb1_another":{$first":$tb1_another"},tb3":{$push":$tb3"}}}

实际上,对于这种情况,您可以使用 所需的所有字段reference/operator/aggregation/first/" rel="noreferrer">$first 只保留第一次"出现(基本上由 "tb2" 的结果重复code> 和 "tb3" unwound ) 然后 $push"tb3" 中的细节"放入一个数组"中,表示与 "tb1" 的关系.p>

但是给定的聚合管道的一般形式是如何从原始 SQL 获得结果的精确表示,它是作为连接"结果的非规范化"输出.在此之后是否要再次标准化"结果取决于您.

select tb1.*,tb3 from tb1,tb2,tb3
 where tb1.id=tb2.profile_id and tb2.field='<text>'
 and tb3.user_id = tb2.id and tb3.status =0

actually i converted the sql as mongo sql as follows

mongo sql which i used

db.getCollection('tb1').aggregate
([
  { $lookup: 
           { from: 'tb2', 
             localField: 'id', 
             foreignField: 'profile_id', 
             as: 'tb2detail' 
            } 
   },

   { $lookup: 
            { from: 'tb3', 
              localField: 'tb2.id', 
              foreignField: 'user_id', 
              as: 'tb3details' 
            } 
    },

{ $match: 
        { 'status': 
                  { '$ne': 'closed' 
                  }, 
          'tb2.profile_type': 'agent', 
          'tb3.status': 0 
         } 
}

])

but not getting as per the expected result..

Any help will be appreciated..

解决方案

What you are missing in here is that $lookup produces an "array" in the output field specified by as in it's arguments. This is the general concept of MongoDB "relations", in that a "relation" between documents is represented as a "sub-property" that is "within" the document itself, being either singular or an "array" for many.

Since MongoDB is "schemaless", the general presumption of $lookup is that you mean "many" and the result is therefore "always" an array. So looking for the "same result as in SQL" then you need to $unwind that array after the $lookup. Whether it's "one" or "many" is of no consequence, since it's still "always" an array:

db.getCollection.('tb1').aggregate([
  // Filter conditions from the source collection
  { "$match": { "status": { "$ne": "closed" } }},

  // Do the first join
  { "$lookup": {
    "from": "tb2",
    "localField": "id",
    "foreignField": "profileId",
    "as": "tb2"
  }},

  // $unwind the array to denormalize
  { "$unwind": "$tb2" },

  // Then match on the condtion for tb2
  { "$match": { "tb2.profile_type": "agent" } },

  // join the second additional collection
  { "$lookup": {
    "from": "tb3",
    "localField": "tb2.id",
    "foreignField": "id",
    "as": "tb3"
  }},

  // $unwind again to de-normalize
  { "$unwind": "$tb3" },

  // Now filter the condition on tb3
  { "$match": { "tb3.status": 0 } },

  // Project only wanted fields. In this case, exclude "tb2"
  { "$project": { "tb2": 0 } }
])

Here you need to note the other things you are missing in the translation:

Sequence is "important"

Aggregation pipelines are more "tersely expressive" than SQL. They are in fact best considered as "a sequence of steps" applied to the datasource in order to collate and transform the data. The best analog to this is "piped" command line instructions, such as:

ps -ef  | grep mongod | grep -v grep | awk '{ print $1 }'

Where the "pipe" | can be considered as a "pipeline stage" in a MongoDB aggregation "pipeline".

As such we want to $match in order to filter things from the "source" collection as our first operation. And this is generally good practice since it removes any documents that did not meet required conditions from further conditions. Just like what is happening in our "command line pipe" example, where we take "input" then "pipe" to a grep to "remove" or "filter".

Paths Matter

Where the very next thing you do here is "join" via $lookup. The result is an "array" of the items from the "from" collection argument matched by the supplied fields to output in the "as" "field path" as an "array".

The naming chosen here is important, since now the "document" from the source collection considers all items from the "join" to now exist at that given path. To make this easy, I use the same "collection" name as the "join" for the new "path".

So starting from the first "join" the output is to "tb2" and that will hold all the results from that collection. There is also an important thing to note with the following sequence of $unwind and then $match, as to how MongoDB actually processes the query.

Certain Sequences "really" matter

Since it "looks like" there are "three" pipeline stages, being $lookup then $unwind and then $match. But in "fact" MongoDB really does something else, which is demonstrated in the output of { "explain": true } added to the .aggregate() command:

    {
        "$lookup" : {
            "from" : "tb2",
            "as" : "tb2",
            "localField" : "id",
            "foreignField" : "profileId",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "profile_type" : {
                    "$eq" : "agent"
                }
            }
        }
    }, 
    {
        "$lookup" : {
            "from" : "tb3",
            "as" : "tb3",
            "localField" : "tb2.id",
            "foreignField" : "id",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "status" : {
                    "$eq" : 0.0
                }
            }
        }
    }, 

So aside from the first point of "sequence" applying where you need to put the $match statements where they are needed and do the "most good", this actually becomes "really important" with the concept of "joins". The thing to note here is that our sequences of $lookup then $unwind and then $match, actually get processed by MongoDB as just the $lookup stages, with the other operations "rolled up" into the one pipeline stage for each.

This is an important distinction to other ways of "filtering" the results obtained by $lookup. Since in this case, the actual "query" conditions on the "join" from $match are performed on the collection to join "before" the results are returned to the parent.

This in combination with $unwind ( which is translated into unwinding ) as shown above is how MongoDB actually deals with the possibility that the "join" could result in producing an array of content in the source document which causes it to exceed the 16MB BSON limit. This would only happen in cases where the result being joined to is very large, but the same advantage is in where the "filter" is actually applied, being on the target collection "before" results are returned.

It is that kind of handling that best "correlates" to the same behavior as a SQL JOIN. It is also therefore the most effective way to obtain results from a $lookup where there are other conditions to apply to the JOIN aside from simply the "local" of "foreign" key values.

Also note that the other behavior change is is from what is essentially a LEFT JOIN performed by $lookup where the "source" document would always be retained regardless of the presence of a matching document in the "target" collection. Instead the $unwind adds to this by "discarding" any results from the "source" which did not have anything matching from the "target" by the additional conditions in $match.

In fact they are even discarded beforehand due to the implied preserveNullAndEmptyArrays: false which is included and would discard anything where the "local" and "foreign" keys did not even match between the two collections. This is a good thing for this particular type of query as the "join" is intended to the "equal" on those values.

Conclude

As noted before, MongoDB generally treats "relations" a lot differently to how you would use a "Relational Database" or RDBMS. The general concept of "relations" is in fact "embedding" the data, either as a single property or as an array.

You may actually desire such output, which is also part of the reason why that without the $unwind sequence here the output of $lookup is actually an "array". However using $unwind in this context is actually the most effective thing to do, as well as giving a guarantee that the "joined" data does not actually cause the aforementioned BSON limit to be exceed as a result of that "join".

If you actually want arrays of output, then the best thing to do here would be to use the $group pipeline stage, and possibly as multiple stages in order to "normalize" and "undo the results" of $unwind

  { "$group": {
    "_id": "$_id",
    "tb1_field": { "$first": "$tb1_field" },
    "tb1_another": { "$first": "$tb1_another" },
    "tb3": { "$push": "$tb3" }    
  }}

Where you would in fact for this case list all the fields you required from "tb1" by their property names using $first to only keep the "first" occurrence ( essentially repeated by results of "tb2" and "tb3" unwound ) and then $push the "detail" from "tb3" into an "array" to represent the relation to "tb1".

But the general form of the aggregation pipeline as given is the exact representation of how results would be obtained from the original SQL, which is "denormalized" output as a result of the "join". Whether you want to "normalize" results again after this is up to you.

这篇关于如何加入两个有条件的附加集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆