文档中的键不同的多个字段平均聚合 [英] Multiple Fields Where Keys In Document Vary Average Aggregation

查看:33
本文介绍了文档中的键不同的多个字段平均聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到的数据集如下:

{
    "_id" : ObjectId("592d4f43d69b643ac0cb9148"),
    "timestamp" : ISODate("2017-03-01T16:58:00.000Z"),
    "Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
   "Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
    "Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}

/* 5 */
{
    "_id" : ObjectId("592d4f43d69b643ac0cb9149"),
    "timestamp" : ISODate("2017-03-01T17:09:00.000Z"),
    "Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
    "Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
   "Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
}

/* 6 */
{
    "_id" : ObjectId("592d3a6cd69b643ac0cae395"),
    "timestamp" : ISODate("2017-01-30T09:31:00.000Z"),
    "Technique-Electrique_Prises de Courant_Power1[W]" : 14.0,
    "Technique-Electrique_Eclairage_Power2[W]" : 360.0,
    "Technique-Electrique_Electroménager_Power3[W]" : 0.0,
    "Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
    "Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}

/* 7 */
{
    "_id" : ObjectId("592d3a6cd69b643ac0cae396"),
    "timestamp" : ISODate("2017-01-30T09:32:00.000Z"),
    "Technique-Electrique_Prises de Courant_Power1[W]" : 15.0,
    "Technique-Electrique_Eclairage_Power2[W]" : 365.0,
    "Technique-Electrique_Electroménager_Power3[W]" : 0.0,
    "Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
    "Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}

有一个_id"、一个时间戳"和多个传感器字段.传感器数量不一致.通过一个界面,我选择了许多我想包含在我的查询中的传感器.此选项存储在一个列表中,其中每个项目都是传感器的名称.

There are an "_id", a "timestamp" and multiple sensors fields. The number of sensor is not consistent. With an interface, I chose numerous sensors which I want to include in my query. This choice is stored in a list where each item is the name of the sensor.

例子:

self.chosenSensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]

我想计算每个所选传感器的平均值.我已经这样做了,但我对每个传感器进行了查询.

I would like to calculate the average of values of each chosen sensor. I already did it but I do a query for each sensor.

在下面的示例中,我将向您展示这一点.(不要考虑日期聚合,这是下一步)

In the following example, I show you that. (Don't consider the date aggregation, it is the next step)

page2.currentColl].aggregate([{"$match":{chosenSensor:{"$exists": True}}}, {"$group":{"_id":{"year":{"$year":"$timestamp"}, "month":{"$month":"$timestamp"}}, "average":{"$avg": chosenSensorAverage}}}])

结果(每个平均值都在一个新文档中):

Result (each average is in a new document):

RDC-ChambreEnfants_CO2_GAS_CONCENTRATION[ppm]
{'_id': {'year': 2017, 'month': 4}, 'average': 1475.3685814315352}
{'_id': {'year': 2017, 'month': 3}, 'average': 1374.3771154414906}
RDC-ChambreEnfants_Humidité_HUMIDITY[%]
{'_id': {'year': 2017, 'month': 4}, 'average': 37.55591753379364}
{'_id': {'year': 2017, 'month': 3}, 'average': 37.459350662153724}

我想得到的是以下内容:

What I would like to get is the following :

{
    "Avg_Technique-Meteo_Direction moyenne du vent_Mean value wind direction[]" : 0.0,
    "Avg_Technique-Meteo_Précipitations_Precipitation status[]" : 0.0,
    "Avg_Technique-Meteo_Direction du vent_Wind direction[]" : 0.0
    "Avg_Technique-Electrique_Prises de Courant_Power1[W]" : 14.5,
    "Avg_Technique-Electrique_Eclairage_Power2[W]" : 362.5,
    "Avg_Technique-Electrique_Electroménager_Power3[W]" : 0.0,
    "Avg_Technique-Electrique_VMC Aldes_Power4[W]" : 14.0,
    "Avg_Technique-Electrique_VMC Unelvent_Power5[W]" : 8.0
}

我得到了一个提示(尼尔·伦恩):

I've been given a hint which is (By Neil Lunn) :

您可以使语句更长并获得计数"和每个使用 $ifNull 来确定何时增加的总和".那么你将 $divide 之后" $group 管道阶段以获得最终结果平均".

You could possibly make the statement longer and get the "counts" and "sums" for each using $ifNull to determine when to increment. Then you would $divide "after" the $group pipeline stage to get the final "average".

如前所述,键名"对我来说似乎是更大的问题,并且会通过将它们移动到元素内的值"可能会更好地处理一个数组

As noted, the "key names" seem the larger problem to me and would probably be better handled by moving them to "values" within elements of an array

我的第一个问题是我不知道如何在查询中使用我的传感器列表.解决这个问题后,可能还会出现其他问题.

The first problem for me is that I don't know how to use my sensors list in the query. Other problems will probably come after this one is resolved.

推荐答案

概念大纲

我在非常简短的评论中基本上要说的是 而不是 为每个传感器键"名称发出单独的聚合查询,您可以将其放入 ONE,只要您正确计算平均值"即可.

Outline of Concept

What I was basically saying in the very brief comment is that instead for issuing a separate aggregation query for every sensor "key" name, you can put it in ONE, as long as you calculate the "averages" correctly.

当然,您的数据中的问题是并非所有文档中都存在键".因此,要获得正确的平均值",我们不能只使用 $avg 因为它会计算所有"文档,无论密钥是否存在.

Of course the problem in your data is that the "keys" are not present in all documents. So to get the correct "average", we cannot just use $avg since it would count "ALL" documents, whether the key was present or not.

因此,我们改为分解数学",并进行 $group 用于每个键的 Total Count 和总 Sum.这使用 $ifNull 来测试对于字段的存在,以及 $cond 替换要返回的值.

So instead we break up the "math", and do a $group for the Total Count and total Sum of each key first. This uses $ifNull to test for the presence of the field, and also $cond to alternate values to return.

.aggregate([
  { "$match": {
    "$or": [
      { "Technique-Electrique_VMC Aldes_Power4[W]": { "$exists": True } },
      { "Technique-Electrique_VMC Unelvent_Power5[W]": { "$exists": True } }
    ]
  }}
  { "$group":{
    "_id":{
      "year":{ "$year":"$timestamp" },
      "month":{ "$month":"$timestamp" }
    },
    "Technique-Electrique_VMC Aldes_Power4[W]-Sum": { 
      "$sum": { 
        "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", 0 ]
      }
    },
    "Technique-Electrique_VMC Aldes_Power4[W]-Count": { 
      "$sum": { 
        "$cond": [
          { "$ifNull": [ "$Technique-Electrique_VMC Aldes_Power4[W]", false ] },
          1,
          0
        ]
      }
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Sum": {
      "$sum": { 
        "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", 0 ]
      }
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Count": {
      "$sum": {
        "$cond": [ 
          { "$ifNull": [ "$Technique-Electrique_VMC Unelvent_Power5[W]", false ] },
          1,
          0
        ]
      }
    }
  }},
  { "$project": {
    "Technique-Electrique_VMC Aldes_Power4[W]-Avg": {
      "$divide": [
        "$Technique-Electrique_VMC Aldes_Power4[W]-Sum",
        "$Technique-Electrique_VMC Aldes_Power4[W]-Count"
      ]
    },
    "Technique-Electrique_VMC Unelvent_Power5[W]-Avg": {
      "$divide": [
        "Technique-Electrique_VMC Unelvent_Power5[W]-Sum",
        "Technique-Electrique_VMC Unelvent_Power5[W]-Count"
      ]
    }
  }}
])

$cond运算符是三元"运算符,这意味着第一个if"条件为 true,then"返回第二个参数,else"返回第三个参数.

The $cond operator is a "ternary" operator which means where the first "if" condition is true, "then" the second argument is returned, "else" the third argument is returned.

所以"Count"中三元的重点是要解决:

So the point of the ternary in the "Count" is to work out:

  • 如果该字段存在则返回 1 表示计数
  • 否则不存在时返回0

$group 完成,为了获得 Average 我们使用 $divide 在单独的 $project 阶段.

After the $group is done, in order to get the Average we use $divide on the two numbers produced for each key within a separate $project stage.

最终结果是您提供的每个键的平均值",这仅考虑了为实际存在该字段的文档添加值和计数.

The end result is the "average" for every key that you supply, and this considered only adding values and counts for documents where the field was actually present.

因此,将所有键放在一个聚合语句中将为您节省大量处理时间和资源.

So putting all the keys in the one aggregation statement will save you a lot of time and resources on processing.

因此,要在 python 中动态"执行此操作,请从列表开始:

So to do this "dynamically" in python, start with the list:

sensors = ["Technique-Electrique_VMC Aldes_Power4[W]", "Technique-Electrique_VMC Unelvent_Power5[W]"]

match = { '$match': { '$or': map(lambda x: { x: { '$exists': True } },sensors) } }

group = { '$group': { 
  '_id': {
    'year': { '$year': '$timestamp' },
    'month': { '$month':'$timestamp' }
  }
}}

project = { '$project': {  } }

for k in sensors:
  group['$group'][k + '-Sum'] = {
    '$sum': { '$ifNull': [ '$' + k, 0 ] }
  }
  group['$group'][k + '-Count'] = {
    '$sum': { '$cond': [ { '$ifNull': [ '$' + k, False ] }, 1, 0 ]  }
  }
  project['$project'][k + '-Avg'] = {
    '$divide': [ '$' + k + '-Sum', '$' + k + '-Count' ]
  }

pipeline = [match,group,project]

对于给定的传感器"列表,这与上面的完整列表相同.

Which generates the same as the full listing above for a given list of "sensors".

这篇关于文档中的键不同的多个字段平均聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆