Mongodb - 对文档进行分组并从每个组中获取具有字段最大值的文档 [英] Mongodb - to group documents and get document from each group with max value of a field

查看:46
本文介绍了Mongodb - 对文档进行分组并从每个组中获取具有字段最大值的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Mongo 集合,其中的文档永远不会更新,而是读取最新的文档,并插入一个新文档,其中一些字段已更新.这些通用文档共享一个标识符 document_identifier 以将它们与其他文档区分开来.

I have a Mongo collection where the documents are never updated, rather the most current document is read, and a new document is inserted with some fields updated on it. These common documents share an identifier document_identifier that distinguishes them from others.

我想执行执行以下操作的查询:获取customer_id 为X 的所有文档,按document_identifier 将它们分组,然后从每个组中取出具有最大updated_at 时间戳的文档.它应该返回整个文档(它们的所有属性).

I want to perform a query that performs the following: Get all documents whose customer_id is X, group them by their document_identifier, and from each group take the document with the max updated_at timestamp. It should return the documents in their entirety (all of their properties).

示例数据集:

{
    document_identifier: "abc",
    updated_at: 1000,
    customer_id: "123",
    ...  
},

{
    document_identifier: "def",
    updated_at: 1001,
    customer_id: "123",
    ...
},

{
    document_identifier: "abc",
    updated_at: 1002,
    customer_id: "123",
    ...
},

{
    document_identifier: "def",
    updated_at: 10003,
    customer_id: "123",
    ...
},

{
    document_identifier: "xyz",
    updated_at: 1004,
    customer_id: "999",
    ...
},

{
    document_identifier: "abc",
    updated_at: 1005,
    customer_id: "123",
    ...
},

{
    document_identifier: "def",
    updated_at: 1006,
    customer_id: "123",
    ...
},

在上面的例子中,如果我想查询123"的 customer_id 结果将是:

In the above example if I wanted to query on customer_id of "123" the result would be:

{
    document_identifier: "abc",
    updated_at: 1005,
    customer_id: "123",
    ...
},

{
    document_identifier: "def",
    updated_at: 1006,
    customer_id: "123",
    ...
},

我一直在使用 Mongo 聚合框架,但似乎无法理解.

I've been steered towards the Mongo aggregate framework but can't seem to get it.

非常感谢任何帮助.

这就是我现在所拥有的,它似乎正在工作,但我不确定它是否是最佳的:

This is what I have right now and it appears to be working, but I'm not sure that it's the most optimal:

db.my_colleciton.aggregate([
    {
       $match: {customer_id: <value to query on>}
    },

    {
        $sort: {updated_at: -1}
    },

    {
        $group: {
            _id: "$document_identifier",
            my_doc: {$first: "$$ROOT"}
        }
    },

    {
        "$replaceRoot": {newRoot: "$my_doc"}
    }
])

推荐答案

所以如果我理解正确,我认为这个查询可能会有所帮助...

So if I understand correctly I think this query might help...

db.records.aggregate(
[
    { $group: {
        _id: {customer_id: "$customer_id", document_identifier: "$document_identifier"},
        max_updated_at: { $max:  "$updated_at" }
    }}
])

这个想法是对两个字段进行分组,customer_iddocument_identifier.对于该组合​​,显示最大 updated_at,它应该是一个滚动整数.

the idea is to group on two fields, customer_id and document_identifier. For that combo show the max updated_at which is expected to be a rolling integer.

对于您提供的数据集,我的结果显示...

For the dataset you provided my results show...

{ "_id" : { "customer_id" : "123", "document_identifier" : "def" }, "max_updated_at" : 10003 }
{ "_id" : { "customer_id" : "999", "document_identifier" : "xyz" }, "max_updated_at" : 1004 }
{ "_id" : { "customer_id" : "123", "document_identifier" : "abc" }, "max_updated_at" : 1005 }

输出的格式与您的示例不同.可以吗,或者您是否需要输出格式与您的示例相匹配?

The format of the output differs from your example. Is that OK, or do you require the format of the output to match your examples?

因此 OP 要求输出的格式与问题中描述的预期格式相匹配.废话少说...

So the OP requests the format of the output match the expected format described in the question. Without further ado...

db.records.aggregate(
[
    { $group: {
        _id: {customer_id: "$customer_id", document_identifier: "$document_identifier"},
        max_updated_at: { $max:  "$updated_at" }
    }},
    { $project: {
        _id: 0,
        document_identifier: "$_id.document_identifier",
        updated_at: "$max_updated_at",
        customer_id: "$_id.customer_id"

    }}
]
)

现在输出看起来像:

{ "document_identifier" : "def", "updated_at" : 10003, "customer_id" : "123" }
{ "document_identifier" : "xyz", "updated_at" : 1004, "customer_id" : "999" }
{ "document_identifier" : "abc", "updated_at" : 1005, "customer_id" : "123" }

编辑编号 2:

好的,所以 OP 的字段比问题中表示的多得多,并且希望查看匹配文档的所有字段.这是到目前为止的查询...

OK, so the OP has many more fields than represented in the question, and would like to see all the fields for matching documents. Here is the query so far...

db.records.aggregate(
[
    { $match: { customer_id: "123" }},
    { $group: {
        _id: {customer_id: "$customer_id", document_identifier: "$document_identifier"},
        max_updated_at: { $max:  "$updated_at" }
    }},
    { $lookup: {
        from: "records",
        let: {
          customer_id: "$_id.customer_id",
          document_identifier: "$_id.document_identifier",
          max_updated_at: "$max_updated_at"
        },
        pipeline: [
          {
            $match: {
              $expr: {
                $and: [
                  { $eq: [ "$customer_id", "$$customer_id"] },
                  { $eq: [ "$document_identifier", "$$document_identifier"] },
                  { $eq: [ "$updated_at", "$$max_updated_at"] }
                ]
              }
            }
          }
        ],
        as: "result"
    }},
    { $unwind: "$result" } ,
    { $replaceRoot: { newRoot: "$result" } }
]
)

现在首先匹配客户 ID.然后它使用 $lookup 进行自连接,然后使用 $replaceRoot 仅显示原始文档.无论存在多少字段,这都会保留原始文档格式.

This now matches on customer id first. It then does a self join using $lookup, then $replaceRoot to show the original documents only. This preserves the original document format regardless of how many fields are present.

输出:

{ "_id" : ObjectId("5db07a5d3cf0c979dd020f85"), "document_identifier" : "def", "updated_at" : 10003, "customer_id" : "123" }
{ "_id" : ObjectId("5db07a5d3cf0c979dd020f87"), "document_identifier" : "abc", "updated_at" : 1005, "customer_id" : "123" }

这篇关于Mongodb - 对文档进行分组并从每个组中获取具有字段最大值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆