如何在Mongo聚合中合并文档中的数组字段 [英] How to merge array field in document in Mongo aggregation

查看:729
本文介绍了如何在Mongo聚合中合并文档中的数组字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要求,我需要对两个记录进行聚合,并且两个记录都有一个具有不同值的数组字段.我需要在对这些记录进行聚合时,结果应该有一个数组,其中两个数组中的值都是唯一的.这是示例:

I have one requirement where i need to do aggregation on two records both have an array field with different value. What I need that when I do aggregation on these records the result should have one array with unique values from both different arrays. Here is example :

第一条记录

 { Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }

第二条记录

{ Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }

在主机和artid上聚合后,我需要这样的结果:

After aggregation on host and artid i need result like this:

 { Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}

我在group语句中尝试了$addToset,但是它给了我这样的标签:[["tag1","tag2"],["tag2","tag3"]]

I tried $addToset in group statement but it gives me like this tags :[["tag1","tag2"],["tag2","tag3"]]

您能帮助我如何实现聚合吗?

Could you please help me how i can achieve this in aggregation

推荐答案

TLDR;

现代版本应将 $reduce $setUnion 在初始

TLDR;

Modern releases should use $reduce with $setUnion after the initial $group as is shown:

db.collection.aggregate([
  { "$group": {
    "_id": { "Host": "$Host", "ArtId": "$ArtId" },
    "count": { "$sum": 1 },
    "tags": { "$addToSet": "$tags" }
  }},
  { "$addFields": {
    "tags": {
      "$reduce": {
        "input": "$tags",
        "initialValue": [],
        "in": { "$setUnion": [ "$$value", "$$this" ] }
      }
    }
  }}
])


您正确地找到了 $addToSet > 运算符,但是在处理数组中的内容时,通常需要使用


You were right in finding the $addToSet operator, but when working with content in an array you generally need to process with $unwind first. This "de-normalizes" the array entries and essentially makes a "copy" of the parent document with each array entry as a singular value in the field. That's what you need to avoid the behavior you are seeing without using that.

尽管您的计数"带来了一个有趣的问题,但是在初始

Your "count" poses an interesting problem though, but easily solved through the use of a "double unwind" after an initial $group operation:

db.collection.aggregate([
    // Group on the compound key and get the occurrences first
    { "$group": {
        "_id": { "Host": "$Host", "ArtId": "$ArtId" },
        "tcount": { "$sum": 1 },
        "ttags": { "$push": "$tags" }
    }},

    // Unwind twice because "ttags" is now an array of arrays
    { "$unwind": "$ttags" },
    { "$unwind": "$ttags" },

    // Now use $addToSet to get the distinct values        
    { "$group": {
        "_id": "$_id",
        "tcount": { "$first": "$tcount" },
        "tags": { "$addToSet": "$ttags" }
    }},

    // Optionally $project to get the fields out of the _id key
    { "$project": {
        "_id": 0,
        "Host": "$_id.Host",
        "ArtId": "$_id.ArtId",
        "count": "$tcount",
        "tags": "$ttags"
    }}
])

最后一点是 $project 也存在,因为我在聚合管道的其他阶段为每个字段使用了临时"名称.这是因为 $project ,以将它们按已经出现的顺序复制"现有阶段中的字段,然后再添加"到文档中.

That final bit with $project is also there because I used "temporary" names for each of the fields in other stages of the aggregation pipeline. This is because there is an optimization in $project that "copies" the fields from an existing stage in the order they already appeared "before" any "new" fields are added to the document.

否则输出将如下所示:

{  "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }

其中的字段与您可能想到的顺序不同.确实微不足道,但这对某些人来说很重要,因此值得解释原因以及如何处理.

Where the fields are not in the same order as you might think. Trivial really, but it matters to some people, so worth explaining why, and how to handle.

所以 $unwind 进行工作以使各项分开,而不是按数组排列,并执行 $group 首先,您可以获取分组"键的出现次数.

So $unwind does the work to keep the items separated and not in arrays, and doing the $group first allows you to get the "count" of the occurrences of the "grouping" key.

$first 运算符后来使用保留"来计数"值,因为它只是对标签"数组中存在的每个值进行复制".无论如何,它们都是相同的值,所以没关系.随便挑一个.

The $first operator used later "keeps" that "count" value, as it just got "duplicated" for every value present in the "tags" array. It's all the same value anyway so it does not matter. Just pick one.

这篇关于如何在Mongo聚合中合并文档中的数组字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆