mongodb聚合框架组+项目 [英] mongodb aggregation framework group + project
问题描述
我有以下问题:
此查询返回1个结果,这是我想要的:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
此查询(我刚刚添加了投影,以便以后可以查询整个文档)返回多个结果.我在做什么错了?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
在$project
阶段,并非所有累加器都可用.我们需要考虑我们在项目中可以针对累加器做些什么,以及我们可以在小组中做些什么.我们来看一下:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
我们要检查任何funding_rounds
是否不为空的地方.然后,它被unwind
-ed到$sort
及以后的阶段.我们将为每个公司的funding_rounds
数组的每个元素看到一个文档.因此,我们要做的第一件事是基于以下内容$sort
:
-
funding_rounds.funded_year
-
funding_rounds.funded_month
-
funding_rounds.funded_day
在按公司名称分组的阶段中,使用$push
构建阵列. $push
应该是文档的一部分,该文档被指定为我们在分组阶段中命名的字段的值.我们可以推送任何有效的表达式.在这种情况下,我们会将文档推送到该数组,对于我们推送的每个文档,文档都会被添加到要累积的数组的末尾.在这种情况下,我们要处理从raised_amount
和funded_year
构建的文档.因此,$group
阶段是包含_id
的文档流,我们在其中指定公司名称.
注意$push
在$group
阶段可用,但在$project
阶段不可用.这是因为$group
阶段被设计为获取一系列文档并根据该文档流累积值.
$project
一次处理一个文档.因此,我们可以在项目阶段内的单个文档中的数组上计算平均值.但是这样做是一次一次,我们正在查看文档,对于每个文档,它都要经过group阶段并推入一个新值,这正是$project
阶段并非旨在执行的操作.对于这种类型的操作,我们要使用$group
.
让我们看看另一个例子:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
>
在$group
阶段,我们使用$first
和$last
累加器.是的,再次可以看到,与$push
一样-在项目阶段不能使用$first
和$last
.同样,因为项目阶段并非旨在基于多个文档来累积值.相反,它们旨在一次重塑一个文档.使用$sum
运算符计算回合总数.值 1 只是计算通过该组的文档数量以及与给定的_id
值匹配或分组的每个文档.该项目可能看起来很复杂,但是只是使输出漂亮.只是其中包括上一个文档中的num_rounds
和total_raised
.
I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
Not all accumulators are available in $project
stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds
is not empty. Then it's unwind
-ed to $sort
and to later stages. We'll see one document for each element of the funding_rounds
array for every company. So, the first thing we're going to do here is to $sort
based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push
. $push
is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount
and funded_year
. So, the $group
stage is a stream of documents that have an _id
where we're specifying the company name.
Notice that $push
is available in $group
stages but not in $project
stage. This is because $group
stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project
on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project
stage is just not designed to do. For that type of operation we want to use $group
.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group
stage, we're using $first
and $last
accumulators. Right, again we can see that as with $push
- we can't use $first
and $last
in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum
operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id
value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds
and total_raised
from the previous document.
这篇关于mongodb聚合框架组+项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!