$lookup 没有 $unwind 的多个级别? [英] $lookup multiple levels without $unwind?

查看:18
本文介绍了$lookup 没有 $unwind 的多个级别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下收藏:

  • 场地集合
<块引用>

{ _id": ObjectId(5acdb8f65ea63a27c1facf86"),姓名": ASA 学院 - 曼哈顿校区",添加者": ObjectId(5ac8ba3582c2345af70d4658"),评论": [ObjectId(5acdb8f65ea63a27c1facf8b"),ObjectId("5ad8288ccdd9241781dce698")]}

  • 评论集
<块引用>

{ _id": ObjectId(5acdb8f65ea63a27c1facf8b"),创建于": ISODate(2018-04-07T12:31:49.503Z"),场地": ObjectId(5acdb8f65ea63a27c1facf86"),作者": ObjectId(5ac8ba3582c2345af70d4658"),内容": 好地方",评论": [ObjectId("5ad87113882d445c5cbc92c8")]}

  • 评论集
<块引用>

{ _id": ObjectId(5ad87113882d445c5cbc92c8"),作者": ObjectId(5ac8ba3582c2345af70d4658"),评论":dcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsf",审查": ObjectId(5acdb8f65ea63a27c1facf8b"),__v": 0}

  • 作者合集
<块引用>

{ _id": ObjectId(5ac8ba3582c2345af70d4658"),名字":布鲁斯",姓氏":韦恩",电子邮件":bruce@linkites.com",追随者": [ObjectId("5ac8b91482c2345af70d4650")]}

现在以下填充查询工作正常

 const 场地 = await Venue.findOne({ _id: id.id }).populate({路径:'评论',选项:{排序:{createdAt:-1}},填充:[{路径:'作者'},{路径:评论",填充:[{路径:作者"}]}]})

但是,我想通过 $lookup 查询来实现它,但是当我对评论执行 '$unwind' 时它会拆分场地...我希望评论在同一个数组中(比如填充) 并以相同的顺序...

我想用 $lookup 实现以下查询,因为作者有关注者字段,所以我需要通过执行 $project 发送字段 isFollow无法使用 populate...

$project: {isFollow: { $in: [mongoose.Types.ObjectId(req.user.id), '$followers'] }}

解决方案

当然,根据您可用的 MongoDB 版本,有几种方法.这些从 $lookup 的不同用法到在 .populate() 结果上启用对象操作通过 .lean().

我确实要求您仔细阅读这些部分,并注意在考虑您的实施解决方案时,所有内容可能并不像看起来那样.

MongoDB 3.6,嵌套";$查找

对于 MongoDB 3.6,$lookup 运算符获得了包含 pipeline 表达式的额外能力,而不是简单地加入本地"表达式.到外国"键值,这意味着您基本上可以执行每个 $lookup 为嵌套";在这些管道表达式中

Venue.aggregate([{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },{$查找":{来自":Review.collection.name,"let": { "reviews": "$reviews";},管道":[{ $match": { $expr": { $in": [ $_id", $$reviews"] } } },{$查找":{来自":Comment.collection.name,"let": { "comments": "$comments";},管道":[{ $match": { $expr": { $in": [ $_id", $$comments"] } } },{$查找":{来自":Author.collection.name,让":{ 作者":$作者"},管道":[{ $match": { $expr": { $eq": [ $_id", $$author"] } } },{ $addFields":{是追随者":{$in":[mongoose.Types.ObjectId(req.user.id),$追随者"]}}}],作为":作者";}},{ $addFields":{作者":{$arrayElemAt":[$作者",0]}}}],作为":评论";}},{ $sort":{ createdAt":-1 } }],作为":评论";}},])

这真的很强大,从原始管道的角度来看,它真的只知道将内容添加到 reviews" 数组,然后每个后续的嵌套"数组.管道表达式也只能看到它的内部".来自连接的元素.

它很强大,在某些方面它可能更清晰一些,因为所有字段路径都相对于嵌套级别,但它确实会在 BSON 结构中开始缩进,你需要知道你是否是在遍历结构时匹配数组或奇异值.

注意我们也可以在这里做一些事情,比如扁平化作者属性".如 comments" 数组条目中所示.所有 $lookup 目标输出可能是阵列",但在子管道"内;我们可以将单个元素数组重新整形为单个值.

标准 MongoDB $lookup

仍然保持加入服务器";您实际上可以使用 $lookup,但它只需要中间处理.这是使用 $unwind 和使用 $group 重建数组的阶段:

Venue.aggregate([{ "$match": { "_id": mongoose.Types.ObjectId(id.id) } },{$查找":{来自":Review.collection.name,localField":评论",foreignField":_id",作为":评论";}},{ "$unwind": "$reviews";},{$查找":{来自":Comment.collection.name,localField":reviews.comments",foreignField":_id",as":reviews.comments",}},{ $unwind":$reviews.comments";},{$查找":{来自":Author.collection.name,localField":reviews.comments.author",foreignField":_id",as":reviews.comments.author"}},{ $unwind":$reviews.comments.author"},{ $addFields":{reviews.comments.author.isFollower":{$in":[mongoose.Types.ObjectId(req.user.id),$reviews.comments.author.followers"]}}},{$组":{_id":{_id":$_id",reviewId":$review._id"},名称":{ $first":$name";},addBy":{ $first":$addBy"},审查":{$first":{_id":$review._id",createdAt":$review.createdAt",地点":$review.venue",作者":$review.author",内容":$review.content";}},评论":{ $push":$reviews.comments";}}},{ $sort":{ _id._id":1,review.createdAt":-1 } },{$组":{_id":$_id._id",名称":{ $first":$name";},addBy":{ $first":$addBy"},评论":{$推送":{_id":$review._id",地点":$review.venue",作者":$review.author",内容":$review.content",评论":$评论";}}}}])

这确实不像您最初想象的那样令人生畏,并且遵循 $lookup<代码>$unwind 在你遍历每个数组时.

作者" 细节当然是单数的,所以一旦它被展开";您只是想保持这种状态,添加字段并开始回滚"过程进入数组.

只有两个级别可以重建回原来的Venue文档,所以第一个细节级别是由Review来重建comments" 数组.您只需 $push $reviews.comments" 的路径,以便收集这些,只要 $reviews._id" 字段在分组"中_id"您需要保留的唯一其他内容是所有其他字段.您也可以将所有这些放入 _id 中,或者您可以使用 $first.

完成后,只有一个 $group 阶段,以便返回 Venue 本身.这次的分组键当然是 "$_id",场地本身的所有属性都使用 $first 和剩余的 "$review" 详细信息返回到带有 $push.当然是 "$comments" 输出>$group 成为 review.comments" 路径.

处理单个文档及其关系,这并不是那么糟糕.$unwind 管道运算符可以通常是一个性能问题,但在这种用法的上下文中,它不应该真正造成那么大的影响.

由于数据仍在在服务器上加入";仍然比其他其他替代方案的流量要少得多.

JavaScript 操作

当然,这里的另一种情况是,您实际上操作的是结果,而不是更改服务器本身的数据.在 大多数 情况下,我会支持这种方法,因为任何添加"都是数据可能最好在客户端处理.

使用 populate() 的问题当然是a> 是,虽然它可能看起来" 是一个更加简化的过程,但实际上它在任何方面都 不是 JOIN.populate() 实际上所做的就是 隐藏"向数据库提交多个查询的底层过程,然后通过异步处理等待结果.

所以连接的外观"实际上是对服务器的多个请求然后对要嵌入的数据进行客户端操作"的结果数组中的细节.

所以除了那个明确警告之外,性能特征远不能与服务器相提并论$lookup,另一个警告当然是mongoose Documents".结果实际上并不是经过进一步操作的纯 JavaScript 对象.

所以为了采用这种方法,您需要添加 .lean() 方法到执行前的查询,以指示 mongoose 返回plain JavaScript objects".而不是使用附加到模型的模式方法强制转换的 Document 类型.当然注意到结果数据不再有权访问任何实例方法".否则将与相关模型本身相关联:

让场地 = await Venue.findOne({ _id: id.id }).populate({路径:'评论',选项:{排序:{createdAt:-1}},填充:[{路径:评论",填充:[{路径:作者"}]}]}).倾斜();

现在venue是一个普通的对象,我们可以根据需要进行简单的处理和调整:

venue.reviews =venue.reviews.map(r =>({...r,评论:r.comments.map(c =>({...C,作者: {...c.作者,isAuthor: c.author.followers.map( f => f.toString() ).indexOf(req.user.id) != -1}}))}));

所以实际上只是循环遍历每个内部数组,直到您可以在 author 详细信息中看到 followers 数组的级别.然后可以在第一次使用 .map() 返回字符串";用于与也是字符串的 req.user.id 进行比较的值(如果不是,则在其上添加 .toString() ),因为它更容易通常通过 JavaScript 代码以这种方式比较这些值.

尽管我需要再次强调它看起来很简单";但实际上,对于系统性能,您确实希望避免这种事情,因为这些额外的查询以及服务器和客户端之间的传输会花费大量的处理时间,甚至由于请求开销,这加起来是真实的托管服务提供商之间的传输成本.


总结

这些基本上是你可以采取的方法,除了自己动手"之外.您实际执行 多个查询" 到数据库的地方,而不是使用 .populate() 是.

使用填充输出,您可以像任何其他数据结构一样简单地操作结果中的数据,只要您应用 .lean() 到查询以转换或以其他方式从返回的 mongoose 文档中提取纯对象数据.

虽然聚合方法看起来更复杂,但在服务器上完成这项工作还有很多"更多优势.可以对较大的结果集进行排序,可以进行计算以进行进一步过滤,当然您会得到一个 单个响应"服务器,所有这些都没有额外的开销.

完全有争议的是,管道本身可以简单地基于已经存储在模式中的属性来构建.所以编写你自己的方法来执行这个构造".基于附加的架构应该不会太难.

从长远来看,当然 $lookup 是更好的解决方案,但您可能需要在初始编码中投入更多的工作,当然,如果您不只是简单地从此处列出的内容中复制;)

I have the following collections:

  • venue collection

{    "_id" : ObjectId("5acdb8f65ea63a27c1facf86"),
     "name" : "ASA College - Manhattan Campus",
     "addedBy" : ObjectId("5ac8ba3582c2345af70d4658"),
     "reviews" : [ 
         ObjectId("5acdb8f65ea63a27c1facf8b"), 
         ObjectId("5ad8288ccdd9241781dce698")
     ] 
}

  • reviews collection

{     "_id" : ObjectId("5acdb8f65ea63a27c1facf8b"),
      "createdAt" : ISODate("2018-04-07T12:31:49.503Z"),
      "venue" : ObjectId("5acdb8f65ea63a27c1facf86"),
      "author" : ObjectId("5ac8ba3582c2345af70d4658"),
      "content" : "nice place",
      "comments" : [ 
          ObjectId("5ad87113882d445c5cbc92c8")
      ]
 }

  • comment collection

{     "_id" : ObjectId("5ad87113882d445c5cbc92c8"),
      "author" : ObjectId("5ac8ba3582c2345af70d4658"),
      "comment" : "dcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsfdcfdsf",
      "review" : ObjectId("5acdb8f65ea63a27c1facf8b"),
      "__v" : 0
}

  • author collection

{    "_id" : ObjectId("5ac8ba3582c2345af70d4658"),
     "firstName" : "Bruce",
     "lastName" : "Wayne",
     "email" : "bruce@linkites.com",
     "followers" : [ObjectId("5ac8b91482c2345af70d4650")]
}

Now the following populate query works fine

    const venues = await Venue.findOne({ _id: id.id })
    .populate({
      path: 'reviews',
      options: { sort: { createdAt: -1 } },
      populate: [
        {  path: 'author'  },
        {  path: 'comments', populate: [{ path: 'author' }] }
      ]
    })

However, I want to achieve it with $lookup query, but it splits the venue when I am doing '$unwind' to the reviews... I want reviews in same array (like populate) and in same order...

I want to achieve following query with $lookup because author have followers field so I need to send field isFollow by doing $project which cannot be done using populate...

$project: {
    isFollow: { $in: [mongoose.Types.ObjectId(req.user.id), '$followers'] }
}

解决方案

There are a couple of approaches of course depending on your available MongoDB version. These vary from different usages of $lookup through to enabling object manipulation on the .populate() result via .lean().

I do ask that you read the sections carefully, and be aware that all may not be as it seems when considering your implementation solution.

MongoDB 3.6, "nested" $lookup

With MongoDB 3.6 the $lookup operator gets the additional ability to include a pipeline expression as opposed to simply joining a "local" to "foreign" key value, what this means is you can essentially do each $lookup as "nested" within these pipeline expressions

Venue.aggregate([
  { "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
  { "$lookup": {
    "from": Review.collection.name,
    "let": { "reviews": "$reviews" },
    "pipeline": [
       { "$match": { "$expr": { "$in": [ "$_id", "$$reviews" ] } } },
       { "$lookup": {
         "from": Comment.collection.name,
         "let": { "comments": "$comments" },
         "pipeline": [
           { "$match": { "$expr": { "$in": [ "$_id", "$$comments" ] } } },
           { "$lookup": {
             "from": Author.collection.name,
             "let": { "author": "$author" },
             "pipeline": [
               { "$match": { "$expr": { "$eq": [ "$_id", "$$author" ] } } },
               { "$addFields": {
                 "isFollower": { 
                   "$in": [ 
                     mongoose.Types.ObjectId(req.user.id),
                     "$followers"
                   ]
                 }
               }}
             ],
             "as": "author"
           }},
           { "$addFields": { 
             "author": { "$arrayElemAt": [ "$author", 0 ] }
           }}
         ],
         "as": "comments"
       }},
       { "$sort": { "createdAt": -1 } }
     ],
     "as": "reviews"
  }},
 ])

This can be really quite powerful, as you see from the perspective of the original pipeline, it really only knows about adding content to the "reviews" array and then each subsequent "nested" pipeline expression also only ever sees it's "inner" elements from the join.

It is powerful and in some respects it may be a bit clearer as all field paths are relative to the nesting level, but it does start that indentation creep in the BSON structure, and you do need to be aware of whether you are matching to arrays or singular values in traversing the structure.

Note we can also do things here like "flattening the author property" as seen within the "comments" array entries. All $lookup target output may be an "array", but within a "sub-pipeline" we can re-shape that single element array into just a single value.

Standard MongoDB $lookup

Still keeping the "join on the server" you can actually do it with $lookup, but it just takes intermediate processing. This is the long standing approach with deconstructing an array with $unwind and the using $group stages to rebuild arrays:

Venue.aggregate([
  { "$match": { "_id": mongoose.Types.ObjectId(id.id) } },
  { "$lookup": {
    "from": Review.collection.name,
    "localField": "reviews",
    "foreignField": "_id",
    "as": "reviews"
  }},
  { "$unwind": "$reviews" },
  { "$lookup": {
    "from": Comment.collection.name,
    "localField": "reviews.comments",
    "foreignField": "_id",
    "as": "reviews.comments",
  }},
  { "$unwind": "$reviews.comments" },
  { "$lookup": {
    "from": Author.collection.name,
    "localField": "reviews.comments.author",
    "foreignField": "_id",
    "as": "reviews.comments.author"
  }},
  { "$unwind": "$reviews.comments.author" },
  { "$addFields": {
    "reviews.comments.author.isFollower": {
      "$in": [ 
        mongoose.Types.ObjectId(req.user.id), 
        "$reviews.comments.author.followers"
      ]
    }
  }},
  { "$group": {
    "_id": { 
      "_id": "$_id",
      "reviewId": "$review._id"
    },
    "name": { "$first": "$name" },
    "addedBy": { "$first": "$addedBy" },
    "review": {
      "$first": {
        "_id": "$review._id",
        "createdAt": "$review.createdAt",
        "venue": "$review.venue",
        "author": "$review.author",
        "content": "$review.content"
      }
    },
    "comments": { "$push": "$reviews.comments" }
  }},
  { "$sort": { "_id._id": 1, "review.createdAt": -1 } },
  { "$group": {
    "_id": "$_id._id",
    "name": { "$first": "$name" },
    "addedBy": { "$first": "$addedBy" },
    "reviews": {
      "$push": {
        "_id": "$review._id",
        "venue": "$review.venue",
        "author": "$review.author",
        "content": "$review.content",
        "comments": "$comments"
      }
    }
  }}
])

This really is not as daunting as you might think at first and follows a simple pattern of $lookup and $unwind as you progress through each array.

The "author" detail of course is singular, so once that is "unwound" you simply want to leave it that way, make the field addition and start the process of "rolling back" into the arrays.

There are only two levels to reconstruct back to the original Venue document, so the first detail level is by Review to rebuild the "comments" array. All you need to is to $push the path of "$reviews.comments" in order to collect these, and as long as the "$reviews._id" field is in the "grouping _id" the only other things you need to keep are all the other fields. You can put all of these into the _id as well, or you can use $first.

With that done there is only one more $group stage in order to get back to Venue itself. This time the grouping key is "$_id" of course, with all properties of the venue itself using $first and the remaining "$review" details going back into an array with $push. Of course the "$comments" output from the previous $group becomes the "review.comments" path.

Working on a single document and it's relations, this is not really so bad. The $unwind pipeline operator can generally be a performance issue, but in the context of this usage it should not really cause that much of an impact.

Since the data is still being "joined on the server" there is still far less traffic than the other remaining alternative.

JavaScript Manipulation

Of course the other case here is that instead of changing data on the server itself, you actually manipulate the result. In most cases I would be in favor of this approach since any "additions" to the data are probably best handled on the client.

The problem of course with using populate() is that whilst it may 'look like' a much more simplified process, it is in fact NOT A JOIN in any way. All populate() actually does is "hide" the underlying process of submitting multiple queries to the database, and then awaiting the results through async handling.

So the "appearance" of a join is actually the result of multiple requests to the server and then doing "client side manipulation" of the data to embed the details within arrays.

So aside from that clear warning that the performance characteristics are nowhere close to being on par with a server $lookup, the other caveat is of course that the "mongoose Documents" in the result are not actually plain JavaScript objects subject to further manipulation.

So in order to take this approach, you need to add the .lean() method to the query before execution, in order to instruct mongoose to return "plain JavaScript objects" instead of Document types which are cast with schema methods attached to the model. Noting of course that the resulting data no longer has access to any "instance methods" that would otherwise be associated with the related models themselves:

let venue = await Venue.findOne({ _id: id.id })
  .populate({ 
    path: 'reviews', 
    options: { sort: { createdAt: -1 } },
    populate: [
     { path: 'comments', populate: [{ path: 'author' }] }
    ]
  })
  .lean();

Now venue is a plain object, we can simply process and adjust as needed:

venue.reviews = venue.reviews.map( r => 
  ({
    ...r,
    comments: r.comments.map( c =>
      ({
        ...c,
        author: {
          ...c.author,
          isAuthor: c.author.followers.map( f => f.toString() ).indexOf(req.user.id) != -1
        }
      })
    )
  })
);

So it's really just a matter of cycling through each of the inner arrays down until the level where you can see the followers array within the author details. The comparison then can be made against the ObjectId values stored in that array after first using .map() to return the "string" values for comparison against the req.user.id which is also a string (if it is not, then also add .toString() on that ), since it is easier in general to compare these values in this way via JavaScript code.

Again though I need to stress that it "looks simple" but it is in fact the sort of thing you really want to avoid for system performance, as those additional queries and the transfer between the server and the client cost a lot in time of processing and even due to the request overhead this adds up to real costs in transport between hosting providers.


Summary

Those are basically your approaches you can take, short of "rolling your own" where you actually perform the "multiple queries" to the database yourself instead of using the helper that .populate() is.

Using the populate output, you can then simply manipulate the data in result just like any other data structure, as long as you apply .lean() to the query to convert or otherwise extract the plain object data from the mongoose documents returned.

Whilst the aggregate approaches look far more involved, there are "a lot" more advantages to doing this work on the server. Larger result sets can be sorted, calculations can be done for further filtering, and of course you get a "single response" to a "single request" made to the server, all with no additional overhead.

It is totally arguable that the pipelines themselves could simply be constructed based on attributes already stored on the schema. So writing your own method to perform this "construction" based on the attached schema should not be too difficult.

In the longer term of course $lookup is the better solution, but you'll probably need to put a little more work into the initial coding, if of course you don't just simply copy from what is listed here ;)

这篇关于$lookup 没有 $unwind 的多个级别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆