多个字段,其中文档中的关键字随平均聚合而变化 [英] Multiple Fields Where Keys In Document Vary Average Aggregation

查看:68
本文介绍了多个字段,其中文档中的关键字随平均聚合而变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到的数据集如下:

{
    "_id" : ObjectId("592d4f43d69b643ac0cb9148"),
    "timestamp" : ISODate("2017-03-01T16:58:00.000Z"),
    "Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
   "Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
    "Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}

/* 5 */
{
    "_id" : ObjectId("592d4f43d69b643ac0cb9149"),
    "timestamp" : ISODate("2017-03-01T17:09:00.000Z"),
    "Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
    "Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
   "Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}

/* 6 */
{
    "_id" : ObjectId("592d3a6cd69b643ac0cae395"),
    "timestamp" : ISODate("2017-01-30T09:31:00.000Z"),
    "Technique-Electrique_Prises de Courant_Power1[W]" : 14.0,
    "Technique-Electrique_Eclairage_Power2[W]" : 360.0,
    "Technique-Electrique_Electroménager_Power3[W]" : 0.0,
    "Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
    "Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}

/* 7 */
{
    "_id" : ObjectId("592d3a6cd69b643ac0cae396"),
    "timestamp" : ISODate("2017-01-30T09:32:00.000Z"),
    "Technique-Electrique_Prises de Courant_Power1[W]" : 15.0,
    "Technique-Electrique_Eclairage_Power2[W]" : 365.0,
    "Technique-Electrique_Electroménager_Power3[W]" : 0.0,
    "Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
    "Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}

有一个"_id",一个时间戳"和多个传感器字段.传感器数量不一致.通过界面,我选择了很多传感器,希望将其包括在查询中.此选择存储在一个列表中,其中每个项目都是传感器的名称.

There are an "_id", a "timestamp" and multiple sensors fields. The number of sensor is not consistent. With an interface, I chose numerous sensors which I want to include in my query. This choice is stored in a list where each item is the name of the sensor.

示例:

self.chosenSensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]

我想计算每个选定传感器的平均值.我已经做到了,但是我要查询每个传感器.

I would like to calculate the average of values of each chosen sensor. I already did it but I do a query for each sensor.

在下面的示例中,我向您展示了这一点. (不考虑日期汇总,这是下一步)

In the following example, I show you that. (Don't consider the date aggregation, it is the next step)

page2.currentColl].aggregate([{"$match":{chosenSensor:{"$exists": True}}}, {"$group":{"_id":{"year":{"$year":"$timestamp"}, "month":{"$month":"$timestamp"}}, "average":{"$avg": chosenSensorAverage}}}])

结果(每个平均值在一个新文档中):

Result (each average is in a new document):

RDC-ChambreEnfants_CO2_GAS_CONCENTRATION[ppm]
{'_id': {'year': 2017, 'month': 4}, 'average': 1475.3685814315352}
{'_id': {'year': 2017, 'month': 3}, 'average': 1374.3771154414906}
RDC-ChambreEnfants_Humidité_HUMIDITY[%]
{'_id': {'year': 2017, 'month': 4}, 'average': 37.55591753379364}
{'_id': {'year': 2017, 'month': 3}, 'average': 37.459350662153724}

我想得到的是以下内容:

What I would like to get is the following :

{
    "Avg_Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
    "Avg_Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
    "Avg_Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
    "Avg_Technique-Electrique_Prises de Courant_Power1[W]" : 14.5,
    "Avg_Technique-Electrique_Eclairage_Power2[W]" : 362.5,
    "Avg_Technique-Electrique_Electroménager_Power3[W]" : 0.0,
    "Avg_Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
    "Avg_Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}

我得到的提示是(由尼尔·伦恩(Neil Lunn)撰写):

I've been given a hint which is (By Neil Lunn) :

您可以延长语句的长度并获取计数",然后 使用$ ifNull确定每个值的和".然后你 会在$ group管道阶段之后划分"以获得最终 平均".

You could possibly make the statement longer and get the "counts" and "sums" for each using $ifNull to determine when to increment. Then you would $divide "after" the $group pipeline stage to get the final "average".

如前所述,键名"在我看来似乎是一个更大的问题, 通过将它们移动到元素内的值"可能会更好地处理 数组

As noted, the "key names" seem the larger problem to me and would probably be better handled by moving them to "values" within elements of an array

对我来说,第一个问题是我不知道如何在查询中使用传感器列表.解决这个问题后,可能还会出现其他问题.

The first problem for me is that I don't know how to use my sensors list in the query. Other problems will probably come after this one is resolved.

推荐答案

概念概述

我在简短的评论中基本上是说,要为每个传感器键"名称发出单独的聚合查询,请代替,您可以将其置于一个 ,只要您正确计算平均值"即可.

Outline of Concept

What I was basically saying in the very brief comment is that instead for issuing a separate aggregation query for every sensor "key" name, you can put it in ONE, as long as you calculate the "averages" correctly.

当然,您数据中的问题是所有文档中都不存在键".因此,要获得正确的平均值",我们不能仅使用 $avg ,因为它将计算所有"文档,无论该密钥是否存在.

Of course the problem in your data is that the "keys" are not present in all documents. So to get the correct "average", we cannot just use $avg since it would count "ALL" documents, whether the key was present or not.

因此,我们改为拆分数学",然后执行 $group 表示每个键的总计Count和总计Sum.这使用 $ifNull 来测试该字段的存在,以及 $cond 来替换要返回的值.

So instead we break up the "math", and do a $group for the Total Count and total Sum of each key first. This uses $ifNull to test for the presence of the field, and also $cond to alternate values to return.

.aggregate([
  { "$match": {
    "$or": [
      { "Technique-Electrique_VMC Aldes_Power4[W]": { "$exists": True } },
      { "Technique-Electrique_VMC Unelvent_Power5[W]": { "$exists": True } }
    ]
  }}
  { "$group":{
    "_id":{
      "year":{ "$year":"$timestamp" },
      "month":{ "$month":"$timestamp" }
    },
    "Technique-Electrique_VMC Aldes_Power4[W]-Sum": { 
      "$sum": { 
        "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", 0 ]
      }
    },
    "Technique-Electrique_VMC Aldes_Power4[W]-Count": { 
      "$sum": { 
        "$cond": [
          { "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", false ] },
          1,
          0
        ]
      }
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Sum": {
      "$sum": { 
        "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", 0 ]
      }
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Count": {
      "$sum": {
        "$cond": [ 
          { "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", false ] },
          1,
          0
        ]
      }
    }
  }},
  { "$project": {
    "Technique-Electrique_VMC Aldes_Power4[W]-Avg": {
      "$divide": [
        "$Technique-Electrique_VMC Aldes_Power4[W]-Sum",
        "$Technique-Electrique_VMC Aldes_Power4[W]-Count"
      ]
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Avg": {
      "$divide": [
        "Technique-Electrique_VMC Unelvent_Power5[W]-Sum",
        "Technique-Electrique_VMC Unelvent_Power5[W]-Count"
      ]
    }
  }}
])

$cond 运算符是三元"运算符,表示第一个"if"条件为true的位置,然后"返回第二个参数的位置,否则"返回第三个参数的位置.

The $cond operator is a "ternary" operator which means where the first "if" condition is true, "then" the second argument is returned, "else" the third argument is returned.

所以"Count"中的三元点是要计算出来的:

So the point of the ternary in the "Count" is to work out:

  • 如果该字段在那里,则返回1作为计数
  • 否则不存在时返回0

按顺序完成 $group 后要获取Average,我们在页面上使用 $divide 在单独的 $project 阶段为每个键生成两个数字

After the $group is done, in order to get the Average we use $divide on the two numbers produced for each key within a separate $project stage.

最终结果是您提供的每个键的平均值",并且这仅考虑了实际存在该字段的文档的值和计数.

The end result is the "average" for every key that you supply, and this considered only adding values and counts for documents where the field was actually present.

因此,将所有密钥放在一个聚合语句中将为您节省大量的处理时间和资源.

So putting all the keys in the one aggregation statement will save you a lot of time and resources on processing.

因此,要在python中动态"执行此操作,请从列表开始:

So to do this "dynamically" in python, start with the list:

sensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]

match = { '$match': { '$or': map(lambda x: { x: { '$exists': True } },sensors) } }

group = { '$group': { 
  '_id': {
    'year': { '$year': '$timestamp' },
    'month': { '$month':'$timestamp' }
  }
}}

project = { '$project': {  } }

for k in sensors:
  group['$group'][k + '-Sum'] = {
    '$sum': { '$ifNull': [ '$' + k, 0 ] }
  }
  group['$group'][k + '-Count'] = {
    '$sum': { '$cond': [ { '$ifNull': [ '$' + k, False ] }, 1, 0 ]  }
  }
  project['$project'][k + '-Avg'] = {
    '$divide': [ '$' + k + '-Sum', '$' + k + '-Count' ]
  }

pipeline = [match,group,project]

对于给定的传感器"列表,其生成的内容与上面的完整列表相同.

Which generates the same as the full listing above for a given list of "sensors".

这篇关于多个字段,其中文档中的关键字随平均聚合而变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆