文档中的键不同的多个字段平均聚合 [英] Multiple Fields Where Keys In Document Vary Average Aggregation
问题描述
我得到的数据集如下:
{
"_id" : ObjectId("592d4f43d69b643ac0cb9148"),
"timestamp" : ISODate("2017-03-01T16:58:00.000Z"),
"Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}
/* 5 */
{
"_id" : ObjectId("592d4f43d69b643ac0cb9149"),
"timestamp" : ISODate("2017-03-01T17:09:00.000Z"),
"Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}
/* 6 */
{
"_id" : ObjectId("592d3a6cd69b643ac0cae395"),
"timestamp" : ISODate("2017-01-30T09:31:00.000Z"),
"Technique-Electrique_Prises de Courant_Power1[W]" : 14.0,
"Technique-Electrique_Eclairage_Power2[W]" : 360.0,
"Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
/* 7 */
{
"_id" : ObjectId("592d3a6cd69b643ac0cae396"),
"timestamp" : ISODate("2017-01-30T09:32:00.000Z"),
"Technique-Electrique_Prises de Courant_Power1[W]" : 15.0,
"Technique-Electrique_Eclairage_Power2[W]" : 365.0,
"Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
有一个_id"、一个时间戳"和多个传感器字段.传感器数量不一致.通过一个界面,我选择了许多我想包含在我的查询中的传感器.此选项存储在一个列表中,其中每个项目都是传感器的名称.
There are an "_id", a "timestamp" and multiple sensors fields. The number of sensor is not consistent. With an interface, I chose numerous sensors which I want to include in my query. This choice is stored in a list where each item is the name of the sensor.
例子:
self.chosenSensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]
我想计算每个所选传感器的平均值.我已经这样做了,但我对每个传感器进行了查询.
I would like to calculate the average of values of each chosen sensor. I already did it but I do a query for each sensor.
在下面的示例中,我将向您展示这一点.(不要考虑日期聚合,这是下一步)
In the following example, I show you that. (Don't consider the date aggregation, it is the next step)
page2.currentColl].aggregate([{"$match":{chosenSensor:{"$exists": True}}}, {"$group":{"_id":{"year":{"$year":"$timestamp"}, "month":{"$month":"$timestamp"}}, "average":{"$avg": chosenSensorAverage}}}])
结果(每个平均值都在一个新文档中):
Result (each average is in a new document):
RDC-ChambreEnfants_CO2_GAS_CONCENTRATION[ppm]
{'_id': {'year': 2017, 'month': 4}, 'average': 1475.3685814315352}
{'_id': {'year': 2017, 'month': 3}, 'average': 1374.3771154414906}
RDC-ChambreEnfants_Humidité_HUMIDITY[%]
{'_id': {'year': 2017, 'month': 4}, 'average': 37.55591753379364}
{'_id': {'year': 2017, 'month': 3}, 'average': 37.459350662153724}
我想得到的是以下内容:
What I would like to get is the following :
{
"Avg_Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Avg_Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Avg_Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
"Avg_Technique-Electrique_Prises de Courant_Power1[W]" : 14.5,
"Avg_Technique-Electrique_Eclairage_Power2[W]" : 362.5,
"Avg_Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Avg_Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Avg_Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
我得到了一个提示(尼尔·伦恩):
I've been given a hint which is (By Neil Lunn) :
您可以使语句更长并获得计数"和每个使用 $ifNull 来确定何时增加的总和".那么你将 $divide 之后" $group 管道阶段以获得最终结果平均".
You could possibly make the statement longer and get the "counts" and "sums" for each using $ifNull to determine when to increment. Then you would $divide "after" the $group pipeline stage to get the final "average".
如前所述,键名"对我来说似乎是更大的问题,并且会通过将它们移动到元素内的值"可能会更好地处理一个数组
As noted, the "key names" seem the larger problem to me and would probably be better handled by moving them to "values" within elements of an array
我的第一个问题是我不知道如何在查询中使用我的传感器列表.解决这个问题后,可能还会出现其他问题.
The first problem for me is that I don't know how to use my sensors list in the query. Other problems will probably come after this one is resolved.
推荐答案
概念大纲
我在非常简短的评论中基本上要说的是 而不是 为每个传感器键"名称发出单独的聚合查询,您可以将其放入 ONE,只要您正确计算平均值"即可.
Outline of Concept
What I was basically saying in the very brief comment is that instead for issuing a separate aggregation query for every sensor "key" name, you can put it in ONE, as long as you calculate the "averages" correctly.
当然,您的数据中的问题是并非所有文档中都存在键".因此,要获得正确的平均值",我们不能只使用 $avg
因为它会计算所有"文档,无论密钥是否存在.
Of course the problem in your data is that the "keys" are not present in all documents. So to get the correct "average", we cannot just use $avg
since it would count "ALL" documents, whether the key was present or not.
因此,我们改为分解数学",并进行 $group
用于每个键的 Total Count
和总 Sum
.这使用 $ifNull
来测试对于字段的存在,以及 $cond
替换要返回的值.
So instead we break up the "math", and do a $group
for the Total Count
and total Sum
of each key first. This uses $ifNull
to test for the presence of the field, and also $cond
to alternate values to return.
.aggregate([
{ "$match": {
"$or": [
{ "Technique-Electrique_VMC Aldes_Power4[W]": { "$exists": True } },
{ "Technique-Electrique_VMC Unelvent_Power5[W]": { "$exists": True } }
]
}}
{ "$group":{
"_id":{
"year":{ "$year":"$timestamp" },
"month":{ "$month":"$timestamp" }
},
"Technique-Electrique_VMC Aldes_Power4[W]-Sum": {
"$sum": {
"$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", 0 ]
}
},
"Technique-Electrique_VMC Aldes_Power4[W]-Count": {
"$sum": {
"$cond": [
{ "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", false ] },
1,
0
]
}
},
"Technique-Electrique_VMC Unelvent_Power5[W]-Sum": {
"$sum": {
"$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", 0 ]
}
},
"Technique-Electrique_VMC Unelvent_Power5[W]-Count": {
"$sum": {
"$cond": [
{ "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", false ] },
1,
0
]
}
}
}},
{ "$project": {
"Technique-Electrique_VMC Aldes_Power4[W]-Avg": {
"$divide": [
"$Technique-Electrique_VMC Aldes_Power4[W]-Sum",
"$Technique-Electrique_VMC Aldes_Power4[W]-Count"
]
},
"Technique-Electrique_VMC Unelvent_Power5[W]-Avg": {
"$divide": [
"Technique-Electrique_VMC Unelvent_Power5[W]-Sum",
"Technique-Electrique_VMC Unelvent_Power5[W]-Count"
]
}
}}
])
$cond
运算符是三元"运算符,这意味着第一个if"条件为 true
,then"返回第二个参数,else"返回第三个参数.
The $cond
operator is a "ternary" operator which means where the first "if" condition is true
, "then" the second argument is returned, "else" the third argument is returned.
所以"Count"
中三元的重点是要解决:
So the point of the ternary in the "Count"
is to work out:
- 如果该字段存在则返回 1 表示计数
- 否则不存在时返回0
$group
完成,为了获得 Average
我们使用 $divide
在单独的 $project
阶段.
After the $group
is done, in order to get the Average
we use $divide
on the two numbers produced for each key within a separate $project
stage.
最终结果是您提供的每个键的平均值",这仅考虑了为实际存在该字段的文档添加值和计数.
The end result is the "average" for every key that you supply, and this considered only adding values and counts for documents where the field was actually present.
因此,将所有键放在一个聚合语句中将为您节省大量处理时间和资源.
So putting all the keys in the one aggregation statement will save you a lot of time and resources on processing.
因此,要在 python 中动态"执行此操作,请从列表开始:
So to do this "dynamically" in python, start with the list:
sensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]
match = { '$match': { '$or': map(lambda x: { x: { '$exists': True } },sensors) } }
group = { '$group': {
'_id': {
'year': { '$year': '$timestamp' },
'month': { '$month':'$timestamp' }
}
}}
project = { '$project': { } }
for k in sensors:
group['$group'][k + '-Sum'] = {
'$sum': { '$ifNull': [ '$' + k, 0 ] }
}
group['$group'][k + '-Count'] = {
'$sum': { '$cond': [ { '$ifNull': [ '$' + k, False ] }, 1, 0 ] }
}
project['$project'][k + '-Avg'] = {
'$divide': [ '$' + k + '-Sum', '$' + k + '-Count' ]
}
pipeline = [match,group,project]
对于给定的传感器"列表,这与上面的完整列表相同.
Which generates the same as the full listing above for a given list of "sensors".
这篇关于文档中的键不同的多个字段平均聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!