多个字段,其中文档中的关键字随平均聚合而变化 [英] Multiple Fields Where Keys In Document Vary Average Aggregation
问题描述
我得到的数据集如下:
{
"_id" : ObjectId("592d4f43d69b643ac0cb9148"),
"timestamp" : ISODate("2017-03-01T16:58:00.000Z"),
"Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}
/* 5 */
{
"_id" : ObjectId("592d4f43d69b643ac0cb9149"),
"timestamp" : ISODate("2017-03-01T17:09:00.000Z"),
"Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}
/* 6 */
{
"_id" : ObjectId("592d3a6cd69b643ac0cae395"),
"timestamp" : ISODate("2017-01-30T09:31:00.000Z"),
"Technique-Electrique_Prises de Courant_Power1[W]" : 14.0,
"Technique-Electrique_Eclairage_Power2[W]" : 360.0,
"Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
/* 7 */
{
"_id" : ObjectId("592d3a6cd69b643ac0cae396"),
"timestamp" : ISODate("2017-01-30T09:32:00.000Z"),
"Technique-Electrique_Prises de Courant_Power1[W]" : 15.0,
"Technique-Electrique_Eclairage_Power2[W]" : 365.0,
"Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
有一个"_id",一个时间戳"和多个传感器字段.传感器数量不一致.通过界面,我选择了很多传感器,希望将其包括在查询中.此选择存储在一个列表中,其中每个项目都是传感器的名称.
There are an "_id", a "timestamp" and multiple sensors fields. The number of sensor is not consistent. With an interface, I chose numerous sensors which I want to include in my query. This choice is stored in a list where each item is the name of the sensor.
示例:
self.chosenSensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]
我想计算每个选定传感器的平均值.我已经做到了,但是我要查询每个传感器.
I would like to calculate the average of values of each chosen sensor. I already did it but I do a query for each sensor.
在下面的示例中,我向您展示了这一点. (不考虑日期汇总,这是下一步)
In the following example, I show you that. (Don't consider the date aggregation, it is the next step)
page2.currentColl].aggregate([{"$match":{chosenSensor:{"$exists": True}}}, {"$group":{"_id":{"year":{"$year":"$timestamp"}, "month":{"$month":"$timestamp"}}, "average":{"$avg": chosenSensorAverage}}}])
结果(每个平均值在一个新文档中):
Result (each average is in a new document):
RDC-ChambreEnfants_CO2_GAS_CONCENTRATION[ppm]
{'_id': {'year': 2017, 'month': 4}, 'average': 1475.3685814315352}
{'_id': {'year': 2017, 'month': 3}, 'average': 1374.3771154414906}
RDC-ChambreEnfants_Humidité_HUMIDITY[%]
{'_id': {'year': 2017, 'month': 4}, 'average': 37.55591753379364}
{'_id': {'year': 2017, 'month': 3}, 'average': 37.459350662153724}
我想得到的是以下内容:
What I would like to get is the following :
{
"Avg_Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
"Avg_Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
"Avg_Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
"Avg_Technique-Electrique_Prises de Courant_Power1[W]" : 14.5,
"Avg_Technique-Electrique_Eclairage_Power2[W]" : 362.5,
"Avg_Technique-Electrique_Electroménager_Power3[W]" : 0.0,
"Avg_Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
"Avg_Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}
我得到的提示是(由尼尔·伦恩(Neil Lunn)撰写):
I've been given a hint which is (By Neil Lunn) :
您可以延长语句的长度并获取计数",然后 使用$ ifNull确定每个值的和".然后你 会在$ group管道阶段之后划分"以获得最终 平均".
You could possibly make the statement longer and get the "counts" and "sums" for each using $ifNull to determine when to increment. Then you would $divide "after" the $group pipeline stage to get the final "average".
如前所述,键名"在我看来似乎是一个更大的问题, 通过将它们移动到元素内的值"可能会更好地处理 数组
As noted, the "key names" seem the larger problem to me and would probably be better handled by moving them to "values" within elements of an array
对我来说,第一个问题是我不知道如何在查询中使用传感器列表.解决这个问题后,可能还会出现其他问题.
The first problem for me is that I don't know how to use my sensors list in the query. Other problems will probably come after this one is resolved.
推荐答案
概念概述
我在简短的评论中基本上是说,要为每个传感器键"名称发出单独的聚合查询,请代替,您可以将其置于一个 ,只要您正确计算平均值"即可.
Outline of Concept
What I was basically saying in the very brief comment is that instead for issuing a separate aggregation query for every sensor "key" name, you can put it in ONE, as long as you calculate the "averages" correctly.
当然,您数据中的问题是所有文档中都不存在键".因此,要获得正确的平均值",我们不能仅使用 $avg
,因为它将计算所有"文档,无论该密钥是否存在.
Of course the problem in your data is that the "keys" are not present in all documents. So to get the correct "average", we cannot just use $avg
since it would count "ALL" documents, whether the key was present or not.
因此,我们改为拆分数学",然后执行 $group
表示每个键的总计Count
和总计Sum
.这使用 $ifNull
来测试该字段的存在,以及 $cond
来替换要返回的值.
So instead we break up the "math", and do a $group
for the Total Count
and total Sum
of each key first. This uses $ifNull
to test for the presence of the field, and also $cond
to alternate values to return.
.aggregate([
{ "$match": {
"$or": [
{ "Technique-Electrique_VMC Aldes_Power4[W]": { "$exists": True } },
{ "Technique-Electrique_VMC Unelvent_Power5[W]": { "$exists": True } }
]
}}
{ "$group":{
"_id":{
"year":{ "$year":"$timestamp" },
"month":{ "$month":"$timestamp" }
},
"Technique-Electrique_VMC Aldes_Power4[W]-Sum": {
"$sum": {
"$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", 0 ]
}
},
"Technique-Electrique_VMC Aldes_Power4[W]-Count": {
"$sum": {
"$cond": [
{ "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", false ] },
1,
0
]
}
},
"Technique-Electrique_VMC Unelvent_Power5[W]-Sum": {
"$sum": {
"$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", 0 ]
}
},
"Technique-Electrique_VMC Unelvent_Power5[W]-Count": {
"$sum": {
"$cond": [
{ "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", false ] },
1,
0
]
}
}
}},
{ "$project": {
"Technique-Electrique_VMC Aldes_Power4[W]-Avg": {
"$divide": [
"$Technique-Electrique_VMC Aldes_Power4[W]-Sum",
"$Technique-Electrique_VMC Aldes_Power4[W]-Count"
]
},
"Technique-Electrique_VMC Unelvent_Power5[W]-Avg": {
"$divide": [
"Technique-Electrique_VMC Unelvent_Power5[W]-Sum",
"Technique-Electrique_VMC Unelvent_Power5[W]-Count"
]
}
}}
])
$cond
运算符是三元"运算符,表示第一个"if"条件为true
的位置,然后"返回第二个参数的位置,否则"返回第三个参数的位置.
The $cond
operator is a "ternary" operator which means where the first "if" condition is true
, "then" the second argument is returned, "else" the third argument is returned.
所以"Count"
中的三元点是要计算出来的:
So the point of the ternary in the "Count"
is to work out:
- 如果该字段在那里,则返回1作为计数
- 否则不存在时返回0
按顺序完成 $group
后要获取Average
,我们在页面上使用 $divide
在单独的 $project
阶段为每个键生成两个数字
After the $group
is done, in order to get the Average
we use $divide
on the two numbers produced for each key within a separate $project
stage.
最终结果是您提供的每个键的平均值",并且这仅考虑了实际存在该字段的文档的值和计数.
The end result is the "average" for every key that you supply, and this considered only adding values and counts for documents where the field was actually present.
因此,将所有密钥放在一个聚合语句中将为您节省大量的处理时间和资源.
So putting all the keys in the one aggregation statement will save you a lot of time and resources on processing.
因此,要在python中动态"执行此操作,请从列表开始:
So to do this "dynamically" in python, start with the list:
sensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]
match = { '$match': { '$or': map(lambda x: { x: { '$exists': True } },sensors) } }
group = { '$group': {
'_id': {
'year': { '$year': '$timestamp' },
'month': { '$month':'$timestamp' }
}
}}
project = { '$project': { } }
for k in sensors:
group['$group'][k + '-Sum'] = {
'$sum': { '$ifNull': [ '$' + k, 0 ] }
}
group['$group'][k + '-Count'] = {
'$sum': { '$cond': [ { '$ifNull': [ '$' + k, False ] }, 1, 0 ] }
}
project['$project'][k + '-Avg'] = {
'$divide': [ '$' + k + '-Sum', '$' + k + '-Count' ]
}
pipeline = [match,group,project]
对于给定的传感器"列表,其生成的内容与上面的完整列表相同.
Which generates the same as the full listing above for a given list of "sensors".
这篇关于多个字段,其中文档中的关键字随平均聚合而变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!