MongoDB:如何查询数据不完整的时间序列? [英] MongoDB: How to query a time-series with incomplete data?

查看:61
本文介绍了MongoDB:如何查询数据不完整的时间序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将时间序列数据存储在一个 mongoDB 集合中,每 15 分钟有一个数据点.但有时,由于条件恶劣,一些数据点会丢失.我有一个数据集如下:

I'm storing time series data in a mongoDB collection with one data point every 15min. But sometimes, due to bad conditions, some data points get lost. I have a dataset as follows:

{"device_id": "ABC","temp": 12,"timestamp": 2020-01-04T17:48:09.000+00:00}
{"device_id": "ABC","temp": 10,"timestamp": 2020-01-04T18:03:09.000+00:00}
{"device_id": "ABC","temp": 14,"timestamp": 2020-01-04T18:18:09.000+00:00}
missing frame
missing frame
{"device_id": "ABC","temp": 13,"timestamp": 2020-01-04T19:03:09.000+00:00}
{"device_id": "ABC","temp": 15,"timestamp": 2020-01-04T19:18:09.000+00:00}
missing frame
{"device_id": "ABC","temp": 10,"timestamp": 2020-01-04T19:48:09.000+00:00}
{"device_id": "ABC","temp": 11,"timestamp": 2020-01-04T20:03:09.000+00:00}
...

我不知道如何查询这个集合,以便每 15 分钟有一个连续的值列表,以便绘制它并显示丢失的消息(通过在丢失消息的情况下更改图形的背景颜色).我希望每 15 分钟对齐一个结果(将 t 和 t+15 分钟之间的值相加),如下所示:

I can't figure out how I can query this collection in order to have a continuous list of value every 15min in order to plot it and displaying lost messages (by changing the background color of the graph in case of lost messages). I would like to have a result aligned on every 15min (which would sum the values between t and t+15min) like this:

{"timestamp": 2020-01-04T17:45:00.000+00:00, "temp": 12, missing: false}
{"timestamp": 2020-01-04T18:00:00.000+00:00, "temp": 10, missing: false}
{"timestamp": 2020-01-04T18:15:00.000+00:00, "temp": 14, missing: false}
{"timestamp": 2020-01-04T18:30:00.000+00:00, "temp":  0, missing: true}
{"timestamp": 2020-01-04T18:45:00.000+00:00, "temp":  0, missing: true}
{"timestamp": 2020-01-04T19:00:00.000+00:00, "temp": 13, missing: false}
{"timestamp": 2020-01-04T19:15:00.000+00:00, "temp": 15, missing: false}
{"timestamp": 2020-01-04T19:30:00.000+00:00, "temp":  0, missing: true}
{"timestamp": 2020-01-04T19:45:00.000+00:00, "temp": 10, missing: false}
{"timestamp": 2020-01-04T20:00:00.000+00:00, "temp": 11, missing: false}

有什么想法吗?预先感谢您的帮助!

Any ideas? Thanks in advance for your help!

推荐答案

这是我在第一条评论中提到的方法的聚合:

Here is aggregation with the approach I had mentioned in my first comment:

db.collection.aggregate( [
  { 
      $sort: { timestamp: 1 } 
  },
  { 
      $group: { 
           _id: null,
           docs: { $push: { timestamp: "$timestamp", device_id: "$device_id", temp: "$temp", missing: false } },
           device_id: { $first: "$device_id" },
           start: { $first: { $toInt: { $divide: [ { "$toLong": "$timestamp" }, 1000 ] } } }, 
           end: { $last: { $toInt: { $divide: [ { "$toLong": "$timestamp" }, 1000 ] } } }
      } 
  },
  { 
      $addFields: {
           docs: {
               $map: {
                    input: { $range: [ { $toInt: "$start" }, { $add: [ { $toInt: "$end" }, 900 ] }, 900 ] }, 
                    as: "ts",
                    in: {
                        ts_exists: { $arrayElemAt: [ 
                                              { $filter: { 
                                                      input: "$docs", as: "d", 
                                                      cond: { $eq: [ { $toInt: { $divide: [ { "$toLong": "$$d.timestamp" }, 1000 ] } },
                                                                      "$$ts"
                                                             ] }
                                               }}, 
                                     0 ] },
                         ts: "$$ts"
                    }
              }
          }
      }
  },
  { 
      $unwind: "$docs" 
  },
  { 
      $addFields: { 
          docs: { 
              $ifNull: [ "$docs.ts_exists", { timestamp: { $toDate: { $multiply: [ "$docs.ts", 1000 ] } }, 
                                              temp: 0, device_id: "$device_id", missing: true 
                                             } 
                       ] 
          }
      }
  },
  { 
      $replaceRoot: { newRoot: "$docs" } 
  }
] ).pretty()

使用以下输入文档:

Using the following input documents:

{"device_id": "ABC","temp": 12,"timestamp": ISODate("2020-01-04T17:45:00.000+00:00") },
{"device_id": "ABC","temp": 10,"timestamp": ISODate("2020-01-04T18:00:00.000+00:00") },
{"device_id": "ABC","temp": 4,"timestamp": ISODate("2020-01-04T18:30:00.000+00:00") },
{"device_id": "ABC","temp": 23,"timestamp": ISODate("2020-01-04T18:45:00.000+00:00") }

结果:

{
        "timestamp" : ISODate("2020-01-04T17:45:00Z"),
        "device_id" : "ABC",
        "temp" : 12,
        "missing" : false
}
{
        "timestamp" : ISODate("2020-01-04T18:00:00Z"),
        "device_id" : "ABC",
        "temp" : 10,
        "missing" : false
}
{
        "timestamp" : ISODate("2020-01-04T18:15:00Z"),
        "temp" : 0,
        "device_id" : "ABC",
        "missing" : true
}
{
        "timestamp" : ISODate("2020-01-04T18:30:00Z"),
        "device_id" : "ABC",
        "temp" : 4,
        "missing" : false
}
{
        "timestamp" : ISODate("2020-01-04T18:45:00Z"),
        "device_id" : "ABC",
        "temp" : 23,
        "missing" : false
}

这篇关于MongoDB:如何查询数据不完整的时间序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆