从数组获取最新的子文档 [英] Get most recent Sub-Document from Array

查看:84
本文介绍了从数组获取最新的子文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组. 我想从我的history 数组(复数)中选择具有最高revision编号的对象.

I have an array. I would like to select the object with the highest revision number from my history arrays (plural).

我的文档看起来像这样(通常它不仅是uploaded_files中的一个对象):

My document looks like this (often it will be more than just one object in uploaded_files):

{
    "_id" : ObjectId("5935a41f12f3fac949a5f925"),
    "project_id" : 13,
    "updated_at" : ISODate("2017-07-02T22:11:43.426Z"),
    "created_at" : ISODate("2017-06-05T18:34:07.150Z"),
    "owner" : ObjectId("591eea4439e1ce33b47e73c3"),
    "name" : "Demo project",
    "uploaded_files" : [ 
        {
            "history" : [ 
                {
                    "file" : ObjectId("59596f9fb6c89a031019bcae"),
                    "revision" : 0
                }
            ],
            "_id" : ObjectId("59596f9fb6c89a031019bcaf")
            "display_name" : "Example filename.txt"
        }
    ]
}

我选择文档的代码:

function getProject(req, projectId) {
    let populateQuery = [
        {path: 'owner'},
        {path: 'uploaded_files.history.file'}
    ]
    return new Promise(function (resolve, reject) {
        Project.findOne({ project_id: projectId }).populate(populateQuery).then((project) => {
            if (!project)
                reject(new createError.NotFound(req.path))
            resolve(project)
        }).catch(function (err) {
            reject(err)
        })
    })
}

如何选择文档,使其仅从历史记录数组中输出修订版本号最高的对象?

How can I select the document so that it only outputs the object with the highest revision number from the history arrays?

推荐答案

您可以通过两种不同的方式解决此问题.当然,它们在方法和性能上会有所不同,我认为您需要对设计进行一些较大的考虑.最值得注意的是,这是实际应用程序使用模式中对修订"数据的需求".

You could tackle this in a couple of different ways. They vary on approach and performance of course, and I think there are some larger considerations you need to make to your design. Most notably here is the "need" for "revisions" data in the usage pattern of your actual application.

至于从内部数组中获取最后一个元素"的最重要点,那么您实际上应该使用

As for the foremost point of getting the "last element from the inner array", then you really should be using an .aggregate() operation to do this:

function getProject(req,projectId) {

  return new Promise((resolve,reject) => {
    Project.aggregate([
      { "$match": { "project_id": projectId } },
      { "$addFields": {
        "uploaded_files": {
          "$map": {
            "input": "$uploaded_files",
            "as": "f",
            "in": {
              "latest": {
                "$arrayElemAt": [
                  "$$f.history",
                  -1
                ]
              },
              "_id": "$$f._id",
              "display_name": "$$f.display_name"
            }
          }
        }
      }},
      { "$lookup": {
        "from": "owner_collection",
        "localField": "owner",
        "foreignField": "_id",
        "as": "owner"
      }},
      { "$unwind": "$uploaded_files" },
      { "$lookup": {
         "from": "files_collection",
         "localField": "uploaded_files.latest.file",
         "foreignField": "_id",
         "as": "uploaded_files.latest.file"
      }},
      { "$group": {
        "_id": "$_id",
        "project_id": { "$first": "$project_id" },
        "updated_at": { "$first": "$updated_at" },
        "created_at": { "$first": "$created_at" },
        "owner" : { "$first": { "$arrayElemAt": [ "$owner", 0 ] } },
        "name":  { "$first": "$name" },
        "uploaded_files": {
          "$push": {
            "latest": { "$arrayElemAt": [ "$$uploaded_files", 0 ] },
            "_id": "$$uploaded_files._id",
            "display_name": "$$uploaded_files.display_name"
          }
        }
      }}
    ])
    .then(result => {
      if (result.length === 0)
        reject(new createError.NotFound(req.path));
      resolve(result[0])
    })
    .catch(reject)
  })
}

因为这是一条聚合语句,所以我们也可以使用

Since this is an aggregation statement where we can also do the "joins" on the "server" as opposed to making additional requests ( which is what .populate() actually does here ) by using $lookup, I'm taking some liberty with the actual collection names since your schema is not included in the question. That's okay, since you did not realize you could in fact do it this way.

当然,服务器需要实际的"集合名称,该名称没有应用程序侧"定义的架构的概念.您可以在这里做一些方便的事情,以后再做.

Of course the "actual" collection names are required by the server, which has no concept of the "application side" defined schema. There are things you can do for convenience here, but more on that later.

您还应该注意,取决于projectId实际来自何处,然后与常规猫鼬方法(例如.find())不同,如果输入值实际上是$match,则$match实际上需要投射"到ObjectId一个字符串".猫鼬不能在聚合管道中应用模式类型",因此您可能需要自己执行此操作,特别是如果projectId来自请求参数:

You should also note that depending on where projectId actually comes from, then unlike regular mongoose methods such as .find() the $match will require actually "casting" to an ObjectId if the input value is in fact a "string". Mongoose cannot apply "schema types" in an aggregation pipeline, so you might need to do this yourself, especially if projectId came from a request parameter:

  { "$match": { "project_id": Schema.Types.ObjectId(projectId) } },

这里的基本部分是我们使用 $map 遍历所有"uploaded_files"条目,然后使用-1.

The basic part here is where we use $map to iterate through all of the "uploaded_files" entries, and then simply extract the "latest" from the "history" array with $arrayElemAt using the "last" index, which is -1.

这应该是合理的,因为最新修订"很可能实际上是最后一个"数组条目.我们可以通过应用 $max 作为 $filter 的条件.这样管道阶段就变成了:

That should be reasonable since it's most likely that the "most recent revision" is in fact the "last" array entry. We could adapt this to look for the "biggest", by applying $max as a condition to $filter. So that pipeline stage becomes:

     { "$addFields": {
        "uploaded_files": {
          "$map": {
            "input": "$uploaded_files",
            "as": "f",
            "in": {
              "latest": {
                "$arrayElemAt": [
                   { "$filter": {
                     "input": "$$f.history.revision",
                     "as": "h",
                     "cond": {
                       "$eq": [
                         "$$h",
                         { "$max": "$$f.history.revision" }
                       ]
                     }
                   }},
                   0
                 ]
              },
              "_id": "$$f._id",
              "display_name": "$$f.display_name"
            }
          }
        }
      }},

除了我们与项,从而使索引从已过滤"的数组中返回第一个"位置或0索引

Which is more or less the same thing, except we do the comparison to the $max value, and return only "one" entry from the array making the index to return from the "filtered" array the "first" position, or 0 index.

关于使用 $lookup 的其他常规技术代替.populate(),请参见在猫鼬中填充后的查询" 中的条目,其中更多地介绍了可以采用这种方法进行优化.

As for other general Techniques on using $lookup in place of .populate(), see my entry on "Querying after populate in Mongoose" which talks a bit more about things that can be optimized when taking this approach.

当然,我们也可以使用.populate()调用并操纵结果数组来进行(尽管效率不高)相同类型的操作:

Also of course we can do ( even though not as efficiently ) the same sort of operation using .populate() calls and manipulating the resulting arrays:

Project.findOne({ "project_id": projectId })
  .populate(populateQuery)
  .lean()
  .then(project => {
    if (project === null) 
      reject(new createError.NotFound(req.path));

      project.uploaded_files = project.uploaded_files.map( f => ({
        latest: f.history.slice(-1)[0],
        _id: f._id,
        display_name: f.display_name
      }));

     resolve(project);
  })
  .catch(reject)

当然您实际上是从"history"返回所有"项目,但是我们只需应用

Where of course you are actually returning "all" of the items from "history", but we simply apply a .map() to invoke the .slice() on those elements to again get the last array element for each.

由于返回了所有的历史记录,因此开销更大,并且.populate()调用是附加请求,但确实获得了相同的最终结果.

A bit more overhead since all the history is returned, and the .populate() calls are additional requests, but it does get the same end results.

我在这里看到的主要问题是内容中甚至还有一个历史"数组.这并不是一个好主意,因为您需要执行上述操作才能只返回所需的相关项目.

The main problem I see here though is that you even have a "history" array within the content. This is not really a great idea since you need to do things like above in order to only return the relevant item you want.

因此,作为设计点",我不会这样做.但是相反,在所有情况下,我都将历史与项目分开".在保留嵌入式"文档的情况下,我会将历史记录"保留在单独的数组中,并且仅保留最新"修订版的实际内容:

So as a "point of design", I would not do this. But instead I would "separate" the history from the items in all cases. Keeping with "embedded" documents, I would keep the "history" in a separate array, and only keep the "latest" revision with the actual content:

{
    "_id" : ObjectId("5935a41f12f3fac949a5f925"),
    "project_id" : 13,
    "updated_at" : ISODate("2017-07-02T22:11:43.426Z"),
    "created_at" : ISODate("2017-06-05T18:34:07.150Z"),
    "owner" : ObjectId("591eea4439e1ce33b47e73c3"),
    "name" : "Demo project",
    "uploaded_files" : [ 
        {
            "latest" : { 
                {
                    "file" : ObjectId("59596f9fb6c89a031019bcae"),
                    "revision" : 1
                }
            },
            "_id" : ObjectId("59596f9fb6c89a031019bcaf"),
            "display_name" : "Example filename.txt"
        }
    ]
    "file_history": [
      { 
        "_id": ObjectId("59596f9fb6c89a031019bcaf"),
        "file": ObjectId("59596f9fb6c89a031019bcae"),
        "revision": 0
    },
    { 
        "_id": ObjectId("59596f9fb6c89a031019bcaf"),
        "file": ObjectId("59596f9fb6c89a031019bcae"),
        "revision": 1
    }

}

您只需设置 $set 即可维护此操作相关条目,并在历史记录"上使用 $push 在一次操作中:

You can maintain this simply by setting $set the relevant entry and using $push on the "history" in the one operation:

.update(
  { "project_id": projectId, "uploaded_files._id": fileId }
  { 
    "$set": {
      "uploaded_files.$.latest": { 
        "file": revisionId,
        "revision": revisionNum
      }
    },
    "$push": {
      "file_history": {
        "_id": fileId,
        "file": revisionId,
        "revision": revisionNum
      }
    }
  }
)

将数组分开,然后您就可以简单地查询并始终获取最新的数据,并丢弃历史记录",直到您真正想要发出该请求为止:

With the array separated, then you can simply query and always get the lastest, and discard the "history" until such time as you actually want to make that request:

Project.findOne({ "project_id": projectId })
  .select('-file_history')      // The '-' here removes the field from results
  .populate(populateQuery)

作为一般情况,我根本不会理会修订"号.在附加"到数组时,保持很多相同的结构并不是真正需要的,因为最新的"总是最后的".更改结构也是如此,在该结构中,最新"将始终是给定上传文件的最后一个条目.

As a general case though I would simply not bother with the "revision" number at all. Keeping much of the same structure you do not really need it when "appending" to an array since the "latest" is always the "last". This is also true of changing the structure, where again the "latest" will always be the last entry for the given uploaded file.

尝试维护这样的人工"索引充满了问题,并且大部分破坏了原子"操作的任何更改,如此处的.update()示例所示,因为您需要知道计数器"值才能提供最新的修订版本号,因此需要从某个地方读取"该版本号.

Trying to maintain such an "artificial" index is fraught with problems, and mostly ruins any change of "atomic" operations as shown in the .update() example here, since you need to know a "counter" value in order to supply the latest revision number, and therefore need to "read" that from somewhere.

这篇关于从数组获取最新的子文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆