如何在有条件的情况下加入另外两个馆藏 [英] How to join to two additional collections with conditions

查看:52
本文介绍了如何在有条件的情况下加入另外两个馆藏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

select tb1.*,tb3 from tb1,tb2,tb3
 where tb1.id=tb2.profile_id and tb2.field='<text>'
 and tb3.user_id = tb2.id and tb3.status =0

实际上我将sql转换为mongo sql,如下所示

我使用的

mongo sql

db.getCollection('tb1').aggregate
([
  { $lookup: 
           { from: 'tb2', 
             localField: 'id', 
             foreignField: 'profile_id', 
             as: 'tb2detail' 
            } 
   },

   { $lookup: 
            { from: 'tb3', 
              localField: 'tb2.id', 
              foreignField: 'user_id', 
              as: 'tb3details' 
            } 
    },

{ $match: 
        { 'status': 
                  { '$ne': 'closed' 
                  }, 
          'tb2.profile_type': 'agent', 
          'tb3.status': 0 
         } 
}

])

但没有达到预期的结果.

任何帮助将不胜感激.

解决方案

在这里您所缺少的是

由于MongoDB是无模式的",因此 $lookup 是您的意思是很多",因此结果是总是"一个数组.因此,寻找与SQL相同的结果",则需要 $unwind $lookup 之后的数组.不管是一个"还是很多"都没有关系,因为它仍然总是"一个数组:

db.getCollection.('tb1').aggregate([
  // Filter conditions from the source collection
  { "$match": { "status": { "$ne": "closed" } }},

  // Do the first join
  { "$lookup": {
    "from": "tb2",
    "localField": "id",
    "foreignField": "profileId",
    "as": "tb2"
  }},

  // $unwind the array to denormalize
  { "$unwind": "$tb2" },

  // Then match on the condtion for tb2
  { "$match": { "tb2.profile_type": "agent" } },

  // join the second additional collection
  { "$lookup": {
    "from": "tb3",
    "localField": "tb2.id",
    "foreignField": "id",
    "as": "tb3"
  }},

  // $unwind again to de-normalize
  { "$unwind": "$tb3" },

  // Now filter the condition on tb3
  { "$match": { "tb3.status": 0 } },

  // Project only wanted fields. In this case, exclude "tb2"
  { "$project": { "tb2": 0 } }
])

在这里,您需要注意翻译中缺少的其他内容:

顺序是重要的"

聚合管道比SQL更具表达性".实际上,最好将它们视为应用于数据源以整理和转换数据的步骤序列" .最好的模拟是管道式"命令行指令,例如:

ps -ef  | grep mongod | grep -v grep | awk '{ print $1 }'

在MongoDB聚合管道"中可以将管道" |视为管道阶段".

因此,我们想要 $match 过滤掉来源"集合中的内容,这是我们的第一个操作.这是一种很好的做法,因为它会从进一步的条件中删除所有不符合要求的条件的文档.就像在命令行管道"示例中发生的情况一样,在此示例中,我们先将输入"然后是管道"放入grep以删除"或过滤".

路径问题

您接下来要做的就是通过 .结果是"from"集合参数中的项目的数组",与提供的字段匹配,并以的形式在"as"字段路径"中输出.

此处选择的命名很重要,因为现在源集合中的文档"会考虑从联接"到现在存在于该给定路径的所有项目.为了简化此操作,我为新的路径"使用与联接"相同的集合"名称.

因此,从第一个"join"开始,输出将输出到"tb2",它将保存该集合的所有结果.以下 $unwind 然后 $match ,有关MongoDB实际处理方式查询.

某些序列确实"重要

因为它看起来像"有三个"管道阶段,所以是$lookup,然后是 $unwind ,然后 $match .但是实际上,MongoDB确实做了其他事情,这在添加到.aggregate()命令的{ "explain": true }输出中得到了证明:

    {
        "$lookup" : {
            "from" : "tb2",
            "as" : "tb2",
            "localField" : "id",
            "foreignField" : "profileId",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "profile_type" : {
                    "$eq" : "agent"
                }
            }
        }
    }, 
    {
        "$lookup" : {
            "from" : "tb3",
            "as" : "tb3",
            "localField" : "tb2.id",
            "foreignField" : "id",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "status" : {
                    "$eq" : 0.0
                }
            }
        }
    }, 

因此,除了序列"的第一点之外,还需要将 $lookup 的序列 $unwind ,然后

与其他过滤" $lookup .由于在这种情况下, $match 在集合上执行以在结果返回给父对象之前加入".

这与 $unwind 结合使用(已翻译进入unwinding),如上所示,这是MongoDB实际上如何处理联接"可能导致源文档中产生内容数组的可能性,该内容数组使其超过16MB BSON限制.仅在合并的结果很大的情况下才会发生这种情况,但是相同的优势在于,实际上是在返回结果之前"的目标集合上实际应用了过滤器".

这种处理最能关联"与SQL JOIN相同的行为.因此,它也是从 $lookup 除了外部"键值的本地"之外,还有其他条件可应用于JOIN.

还要注意,其他行为更改实际上是由 $unwind 通过丢弃"任何结果来添加此内容通过

结论

如前所述,MongoDB通常对待关系"与使用关系数据库"或RDBMS的方式有很大不同.实际上,关系"的一般概念是将数据作为单个属性或数组嵌入".

您实际上可能希望得到这样的输出,这也是为什么没有 $lookup的输出排序实际上是一个数组".但是,在这种情况下使用 $unwind 实际上是最有效的方法这样做,并保证连接"数据实际上不会由于该连接"而导致超出上述BSON限制.

如果您实际上想要输出数组,那么最好的做法是使用 $group 管道阶段,并可能作为多个阶段,以对

在这种情况下,实际上您会在其中使用"tb2"和"tb3"的结果重复显示),然后中的详细信息"转换为数组"以表示与"tb1"的关系

但是,给定的聚合管道的一般形式是如何从原始SQL中获得结果的精确表示,该原始SQL作为联接"的结果被非规范化"输出.在您决定之后,是否要再次规范化"结果.

select tb1.*,tb3 from tb1,tb2,tb3
 where tb1.id=tb2.profile_id and tb2.field='<text>'
 and tb3.user_id = tb2.id and tb3.status =0

actually i converted the sql as mongo sql as follows

mongo sql which i used

db.getCollection('tb1').aggregate
([
  { $lookup: 
           { from: 'tb2', 
             localField: 'id', 
             foreignField: 'profile_id', 
             as: 'tb2detail' 
            } 
   },

   { $lookup: 
            { from: 'tb3', 
              localField: 'tb2.id', 
              foreignField: 'user_id', 
              as: 'tb3details' 
            } 
    },

{ $match: 
        { 'status': 
                  { '$ne': 'closed' 
                  }, 
          'tb2.profile_type': 'agent', 
          'tb3.status': 0 
         } 
}

])

but not getting as per the expected result..

Any help will be appreciated..

解决方案

What you are missing in here is that $lookup produces an "array" in the output field specified by as in it's arguments. This is the general concept of MongoDB "relations", in that a "relation" between documents is represented as a "sub-property" that is "within" the document itself, being either singular or an "array" for many.

Since MongoDB is "schemaless", the general presumption of $lookup is that you mean "many" and the result is therefore "always" an array. So looking for the "same result as in SQL" then you need to $unwind that array after the $lookup. Whether it's "one" or "many" is of no consequence, since it's still "always" an array:

db.getCollection.('tb1').aggregate([
  // Filter conditions from the source collection
  { "$match": { "status": { "$ne": "closed" } }},

  // Do the first join
  { "$lookup": {
    "from": "tb2",
    "localField": "id",
    "foreignField": "profileId",
    "as": "tb2"
  }},

  // $unwind the array to denormalize
  { "$unwind": "$tb2" },

  // Then match on the condtion for tb2
  { "$match": { "tb2.profile_type": "agent" } },

  // join the second additional collection
  { "$lookup": {
    "from": "tb3",
    "localField": "tb2.id",
    "foreignField": "id",
    "as": "tb3"
  }},

  // $unwind again to de-normalize
  { "$unwind": "$tb3" },

  // Now filter the condition on tb3
  { "$match": { "tb3.status": 0 } },

  // Project only wanted fields. In this case, exclude "tb2"
  { "$project": { "tb2": 0 } }
])

Here you need to note the other things you are missing in the translation:

Sequence is "important"

Aggregation pipelines are more "tersely expressive" than SQL. They are in fact best considered as "a sequence of steps" applied to the datasource in order to collate and transform the data. The best analog to this is "piped" command line instructions, such as:

ps -ef  | grep mongod | grep -v grep | awk '{ print $1 }'

Where the "pipe" | can be considered as a "pipeline stage" in a MongoDB aggregation "pipeline".

As such we want to $match in order to filter things from the "source" collection as our first operation. And this is generally good practice since it removes any documents that did not meet required conditions from further conditions. Just like what is happening in our "command line pipe" example, where we take "input" then "pipe" to a grep to "remove" or "filter".

Paths Matter

Where the very next thing you do here is "join" via $lookup. The result is an "array" of the items from the "from" collection argument matched by the supplied fields to output in the "as" "field path" as an "array".

The naming chosen here is important, since now the "document" from the source collection considers all items from the "join" to now exist at that given path. To make this easy, I use the same "collection" name as the "join" for the new "path".

So starting from the first "join" the output is to "tb2" and that will hold all the results from that collection. There is also an important thing to note with the following sequence of $unwind and then $match, as to how MongoDB actually processes the query.

Certain Sequences "really" matter

Since it "looks like" there are "three" pipeline stages, being $lookup then $unwind and then $match. But in "fact" MongoDB really does something else, which is demonstrated in the output of { "explain": true } added to the .aggregate() command:

    {
        "$lookup" : {
            "from" : "tb2",
            "as" : "tb2",
            "localField" : "id",
            "foreignField" : "profileId",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "profile_type" : {
                    "$eq" : "agent"
                }
            }
        }
    }, 
    {
        "$lookup" : {
            "from" : "tb3",
            "as" : "tb3",
            "localField" : "tb2.id",
            "foreignField" : "id",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "status" : {
                    "$eq" : 0.0
                }
            }
        }
    }, 

So aside from the first point of "sequence" applying where you need to put the $match statements where they are needed and do the "most good", this actually becomes "really important" with the concept of "joins". The thing to note here is that our sequences of $lookup then $unwind and then $match, actually get processed by MongoDB as just the $lookup stages, with the other operations "rolled up" into the one pipeline stage for each.

This is an important distinction to other ways of "filtering" the results obtained by $lookup. Since in this case, the actual "query" conditions on the "join" from $match are performed on the collection to join "before" the results are returned to the parent.

This in combination with $unwind ( which is translated into unwinding ) as shown above is how MongoDB actually deals with the possibility that the "join" could result in producing an array of content in the source document which causes it to exceed the 16MB BSON limit. This would only happen in cases where the result being joined to is very large, but the same advantage is in where the "filter" is actually applied, being on the target collection "before" results are returned.

It is that kind of handling that best "correlates" to the same behavior as a SQL JOIN. It is also therefore the most effective way to obtain results from a $lookup where there are other conditions to apply to the JOIN aside from simply the "local" of "foreign" key values.

Also note that the other behavior change is is from what is essentially a LEFT JOIN performed by $lookup where the "source" document would always be retained regardless of the presence of a matching document in the "target" collection. Instead the $unwind adds to this by "discarding" any results from the "source" which did not have anything matching from the "target" by the additional conditions in $match.

In fact they are even discarded beforehand due to the implied preserveNullAndEmptyArrays: false which is included and would discard anything where the "local" and "foreign" keys did not even match between the two collections. This is a good thing for this particular type of query as the "join" is intended to the "equal" on those values.

Conclude

As noted before, MongoDB generally treats "relations" a lot differently to how you would use a "Relational Database" or RDBMS. The general concept of "relations" is in fact "embedding" the data, either as a single property or as an array.

You may actually desire such output, which is also part of the reason why that without the $unwind sequence here the output of $lookup is actually an "array". However using $unwind in this context is actually the most effective thing to do, as well as giving a guarantee that the "joined" data does not actually cause the aforementioned BSON limit to be exceed as a result of that "join".

If you actually want arrays of output, then the best thing to do here would be to use the $group pipeline stage, and possibly as multiple stages in order to "normalize" and "undo the results" of $unwind

  { "$group": {
    "_id": "$_id",
    "tb1_field": { "$first": "$tb1_field" },
    "tb1_another": { "$first": "$tb1_another" },
    "tb3": { "$push": "$tb3" }    
  }}

Where you would in fact for this case list all the fields you required from "tb1" by their property names using $first to only keep the "first" occurrence ( essentially repeated by results of "tb2" and "tb3" unwound ) and then $push the "detail" from "tb3" into an "array" to represent the relation to "tb1".

But the general form of the aggregation pipeline as given is the exact representation of how results would be obtained from the original SQL, which is "denormalized" output as a result of the "join". Whether you want to "normalize" results again after this is up to you.

这篇关于如何在有条件的情况下加入另外两个馆藏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆