如何在 Mongo 聚合中合并文档中的数组字段 [英] How to merge array field in document in Mongo aggregation

查看:25
本文介绍了如何在 Mongo 聚合中合并文档中的数组字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要求,我需要对两条记录进行聚合,这两条记录都有一个具有不同值的数组字段.我需要的是,当我对这些记录进行聚合时,结果应该有一个数组,其中包含来自两个不同数组的唯一值.这是示例:

I have one requirement where i need to do aggregation on two records both have an array field with different value. What I need that when I do aggregation on these records the result should have one array with unique values from both different arrays. Here is example :

第一条记录

 { Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }

第二次记录

{ Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }

在主机和 artid 上聚合后,我需要这样的结果:

After aggregation on host and artid i need result like this:

 { Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}

我在 group 语句中尝试了 $addToset 但它给了我这样的标签:[["tag1","tag2"],["tag2","tag3"]]

I tried $addToset in group statement but it gives me like this tags :[["tag1","tag2"],["tag2","tag3"]]

您能帮我如何在聚合中实现这一目标

Could you please help me how i can achieve this in aggregation

推荐答案

TLDR;

现代版本应该使用 $reduce$setUnion 之后初始 $group 如图所示:

db.collection.aggregate([
  { "$group": {
    "_id": { "Host": "$Host", "ArtId": "$ArtId" },
    "count": { "$sum": 1 },
    "tags": { "$addToSet": "$tags" }
  }},
  { "$addFields": {
    "tags": {
      "$reduce": {
        "input": "$tags",
        "initialValue": [],
        "in": { "$setUnion": [ "$$value", "$$this" ] }
      }
    }
  }}
])

<小时>

你找到了 $addToSet 运算符,但是在处理数组中的内容时,您通常需要使用 $unwind 首先.这将数组条目反规范化",并实质上制作了父文档的副本",其中每个数组条目作为字段中的奇异值.这就是您需要避免在不使用它的情况下看到的行为.


You were right in finding the $addToSet operator, but when working with content in an array you generally need to process with $unwind first. This "de-normalizes" the array entries and essentially makes a "copy" of the parent document with each array entry as a singular value in the field. That's what you need to avoid the behavior you are seeing without using that.

虽然您的计数"提出了一个有趣的问题,但通过在初始 $group 操作:

Your "count" poses an interesting problem though, but easily solved through the use of a "double unwind" after an initial $group operation:

db.collection.aggregate([
    // Group on the compound key and get the occurrences first
    { "$group": {
        "_id": { "Host": "$Host", "ArtId": "$ArtId" },
        "tcount": { "$sum": 1 },
        "ttags": { "$push": "$tags" }
    }},

    // Unwind twice because "ttags" is now an array of arrays
    { "$unwind": "$ttags" },
    { "$unwind": "$ttags" },

    // Now use $addToSet to get the distinct values        
    { "$group": {
        "_id": "$_id",
        "tcount": { "$first": "$tcount" },
        "tags": { "$addToSet": "$ttags" }
    }},

    // Optionally $project to get the fields out of the _id key
    { "$project": {
        "_id": 0,
        "Host": "$_id.Host",
        "ArtId": "$_id.ArtId",
        "count": "$tcount",
        "tags": "$ttags"
    }}
])

最后一点 $project 也在那里,因为我在聚合管道的其他阶段为每个字段使用了临时"名称.这是因为 $project 以在"任何新"字段添加到文档之前出现的顺序从现有阶段复制"字段.

That final bit with $project is also there because I used "temporary" names for each of the fields in other stages of the aggregation pipeline. This is because there is an optimization in $project that "copies" the fields from an existing stage in the order they already appeared "before" any "new" fields are added to the document.

否则输出将如下所示:

{  "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }

字段的顺序与您想象的不同.确实微不足道,但对某些人来说很重要,因此值得解释为什么以及如何处理.

Where the fields are not in the same order as you might think. Trivial really, but it matters to some people, so worth explaining why, and how to handle.

所以 $unwind 将项目分开而不是在数组中,并执行 $group 首先允许您获取分组"键出现的计数".

So $unwind does the work to keep the items separated and not in arrays, and doing the $group first allows you to get the "count" of the occurrences of the "grouping" key.

$first 运算符稍后使用保留"该计数"值,因为它只是为标签"数组中存在的每个值复制".无论如何,这都是相同的值,所以没关系.随便挑一个.

The $first operator used later "keeps" that "count" value, as it just got "duplicated" for every value present in the "tags" array. It's all the same value anyway so it does not matter. Just pick one.

这篇关于如何在 Mongo 聚合中合并文档中的数组字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆