如何在Mongo聚合中合并文档中的数组字段 [英] How to merge array field in document in Mongo aggregation
问题描述
我有一个要求,我需要对两个记录进行聚合,并且两个记录都有一个具有不同值的数组字段.我需要在对这些记录进行聚合时,结果应该有一个数组,其中两个数组中的值都是唯一的.这是示例:
I have one requirement where i need to do aggregation on two records both have an array field with different value. What I need that when I do aggregation on these records the result should have one array with unique values from both different arrays. Here is example :
第一条记录
{ Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }
第二条记录
{ Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }
在主机和artid上聚合后,我需要这样的结果:
After aggregation on host and artid i need result like this:
{ Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}
我在group语句中尝试了$addToset
,但是它给了我这样的标签:[["tag1","tag2"],["tag2","tag3"]]
I tried $addToset
in group statement but it gives me like this tags :[["tag1","tag2"],["tag2","tag3"]]
您能帮助我如何实现聚合吗?
Could you please help me how i can achieve this in aggregation
推荐答案
TLDR;
现代版本应将 $reduce
与 $setUnion
在初始
TLDR;
Modern releases should use $reduce
with $setUnion
after the initial $group
as is shown:
db.collection.aggregate([
{ "$group": {
"_id": { "Host": "$Host", "ArtId": "$ArtId" },
"count": { "$sum": 1 },
"tags": { "$addToSet": "$tags" }
}},
{ "$addFields": {
"tags": {
"$reduce": {
"input": "$tags",
"initialValue": [],
"in": { "$setUnion": [ "$$value", "$$this" ] }
}
}
}}
])
您正确地找到了 $addToSet
> 运算符,但是在处理数组中的内容时,通常需要使用
You were right in finding the $addToSet
operator, but when working with content in an array you generally need to process with $unwind
first. This "de-normalizes" the array entries and essentially makes a "copy" of the parent document with each array entry as a singular value in the field. That's what you need to avoid the behavior you are seeing without using that.
Your "count" poses an interesting problem though, but easily solved through the use of a "double unwind" after an initial $group
operation:
db.collection.aggregate([
// Group on the compound key and get the occurrences first
{ "$group": {
"_id": { "Host": "$Host", "ArtId": "$ArtId" },
"tcount": { "$sum": 1 },
"ttags": { "$push": "$tags" }
}},
// Unwind twice because "ttags" is now an array of arrays
{ "$unwind": "$ttags" },
{ "$unwind": "$ttags" },
// Now use $addToSet to get the distinct values
{ "$group": {
"_id": "$_id",
"tcount": { "$first": "$tcount" },
"tags": { "$addToSet": "$ttags" }
}},
// Optionally $project to get the fields out of the _id key
{ "$project": {
"_id": 0,
"Host": "$_id.Host",
"ArtId": "$_id.ArtId",
"count": "$tcount",
"tags": "$ttags"
}}
])
最后一点是 $project
也存在,因为我在聚合管道的其他阶段为每个字段使用了临时"名称.这是因为 $project
,以将它们按已经出现的顺序复制"现有阶段中的字段,然后再添加"到文档中.
That final bit with $project
is also there because I used "temporary" names for each of the fields in other stages of the aggregation pipeline. This is because there is an optimization in $project
that "copies" the fields from an existing stage in the order they already appeared "before" any "new" fields are added to the document.
否则输出将如下所示:
{ "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }
其中的字段与您可能想到的顺序不同.确实微不足道,但这对某些人来说很重要,因此值得解释原因以及如何处理.
Where the fields are not in the same order as you might think. Trivial really, but it matters to some people, so worth explaining why, and how to handle.
所以 $unwind
进行工作以使各项分开,而不是按数组排列,并执行 $group
首先,您可以获取分组"键的出现次数.
So $unwind
does the work to keep the items separated and not in arrays, and doing the $group
first allows you to get the "count" of the occurrences of the "grouping" key.
$first
运算符后来使用保留"来计数"值,因为它只是对标签"数组中存在的每个值进行复制".无论如何,它们都是相同的值,所以没关系.随便挑一个.
The $first
operator used later "keeps" that "count" value, as it just got "duplicated" for every value present in the "tags" array. It's all the same value anyway so it does not matter. Just pick one.
这篇关于如何在Mongo聚合中合并文档中的数组字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!