Mongodb聚合框架|分组多个值? [英] Mongodb Aggregation Framework | Group over multiple values?
问题描述
我想使用mongoDB的Aggregation Framework来运行SQL中的内容:
SELECT SUM(A), B, C from myTable GROUP BY B, C;
文档状态:
您可以从管道中的文档中指定单个字段,先前计算的值或由多个传入字段组成的聚合键.
但是目前尚不清楚什么是由多个传入字段构成的聚合密钥"?
我的数据集有点像这样:
[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"},
{ "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"},
{ "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"},
{ "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"},
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"},
{ "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]
我的查询如下:
resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
timeStamp : 1 ,
label : 1,
responseCode : 1 ,
value : 1,
success : 1
}},
{ $group : {
_id : "$label",
max_timeStamp : { $timeStamp : 1 },
count_responseCode : { $sum : 1 },
avg_value : { $sum : "$value" },
count_success : { $sum : 1 }
}},
{ $group : {
?
}}
);
我的直觉是尝试将结果传递给第二组,我知道您可以这样做,但是它不起作用,因为第一组已经减少了太多数据集,并且丢失了所需的详细程度. /p>
我想做的是使用label
,responseCode
和success
进行分组,并从结果中获取值的总和.它应该看起来像:
label | code | success | sum_of_values | count
sharon | 200 | true | 10 | 1
sharon | 200 | false | 35 | 1
paul | 200 | true | 100 | 2
paul | 404 | true | 15 | 1
paul | 404 | false | 99 | 1
共有五个组:
1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}
2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}
3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}
4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}
5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}
确定,因此解决方案是为_id值指定一个聚合键. 此处记录为:
您可以从管道中的文档中指定单个字段,先前计算的值或由多个传入字段组成的聚合键.
但是它实际上并没有定义聚合键的格式.在此处阅读早期文档,我发现以前的collection.group方法可以包含多个字段,并且在新框架中使用了相同的结构.
因此,要对多个字段进行分组,可以使用_id : { success:'$success', responseCode:'$responseCode', label:'$label'}
如:
resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
timeStamp : 1 ,
label : 1,
responseCode : 1 ,
value : 1,
success : 1
}},
{ $group : {
_id : { success:'$success', responseCode:'$responseCode', label:'$label'},
max_timeStamp : { $timeStamp : 1 },
count_responseCode : { $sum : 1 },
avg_value : { $sum : "$value" },
count_success : { $sum : 1 }
}}
);
I would like to use mongoDB's Aggregation Framework to run what in SQL would look a bit like:
SELECT SUM(A), B, C from myTable GROUP BY B, C;
The docs state:
You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.
But it's unclear what 'an aggregate key made from several incoming fields' actually is?
My dataset is a bit like this:
[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"},
{ "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"},
{ "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"},
{ "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"},
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"},
{ "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]
My query looks like this:
resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
timeStamp : 1 ,
label : 1,
responseCode : 1 ,
value : 1,
success : 1
}},
{ $group : {
_id : "$label",
max_timeStamp : { $timeStamp : 1 },
count_responseCode : { $sum : 1 },
avg_value : { $sum : "$value" },
count_success : { $sum : 1 }
}},
{ $group : {
?
}}
);
My instinct was to try to pipe the results through to a second group, I know you can do this but it won't work because the first group already reduces the dataset too much and the required level of detail is lost.
What I want to do is group using label
, responseCode
and success
and get the sum of value from the result. It should look a bit like:
label | code | success | sum_of_values | count
sharon | 200 | true | 10 | 1
sharon | 200 | false | 35 | 1
paul | 200 | true | 100 | 2
paul | 404 | true | 15 | 1
paul | 404 | false | 99 | 1
Where there are five groups:
1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}
2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}
3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}
4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}
5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}
OK, so the solution is to specify an aggregate key for the _id value. This is documented here as:
You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.
But it doesn't actually define the format for an aggregate key. Reading the earlier documentation here I saw that the previous collection.group method could take multiple fields and that the same structure is used in the new framework.
So, to group over multiple fields you could use _id : { success:'$success', responseCode:'$responseCode', label:'$label'}
As in:
resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
timeStamp : 1 ,
label : 1,
responseCode : 1 ,
value : 1,
success : 1
}},
{ $group : {
_id : { success:'$success', responseCode:'$responseCode', label:'$label'},
max_timeStamp : { $timeStamp : 1 },
count_responseCode : { $sum : 1 },
avg_value : { $sum : "$value" },
count_success : { $sum : 1 }
}}
);
这篇关于Mongodb聚合框架|分组多个值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!