检测时间序列中满足特定标准的连续项目 [英] Detect consecutive items meeting particular criteria in a time series

查看:53
本文介绍了检测时间序列中满足特定标准的连续项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 MySQL 数据库,每半小时就有超过 92.000 行的天气记录.日 |月 |年 |时间 |温度|...我试图获得(在 PHP 中)峰值温度:显示最大时间量(连续寄存器),温度 =<每月分钟(温度)+3.

I have a MySQL data base with more than 92.000 rows with weather registers every half an hour. Day | Month | Year | Time | Temperature |... I'm trying to obtain (in PHP) Peak temperatures: Show the maximum amount of time(consecutive registers) with Temperature =< min(temperature)+3 in each month.

我将不胜感激!

推荐答案

我的方法是:从观察的时间序列开始,并给每个观察一个序列号.

My approach to this: start with the time-series of observations, and give each one a serial number.

这个序列号在 MySQL 中很头疼,但没关系.给定一个带有 ts 列(一个日期时间项)和一个临时列的表,下面是使用序列号获取它们的查询.

This serial numbering is a pain in the neck in MySQL, but no matter. Given a table with a ts column (a datetime item) and a temp column, here's the query to get them with serial numbers.

SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s 

看看这个 sqlfiddle:http://sqlfiddle.com/#!2/d81e2/5/0

Take a look at this sqlfiddle: http://sqlfiddle.com/#!2/d81e2/5/0

好的,这很简单.现在,假设我们正在寻找温度为 25 度或以上的时间段.为此,我们需要切分时间序列,以便忽略那些观察结果.事情是这样的:

OK, that's pretty trivial. Now, let's say we're looking for periods of time where the temperature is 25 degrees or above. To do this we need to chop up the time series so it omits those observations. That goes like this:

SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s

这是 sqlfiddle:http://sqlfiddle.com/#!2/d81e2/6/0

Here's the sqlfiddle: http://sqlfiddle.com/#!2/d81e2/6/0

现在下一个技巧是找到这个序列中的时间间隔.我们可以使用this SO post中的技术来做到这一点.在时间序列数据中寻找差距的方法在 MySQL 中?

Now the next trick is to find the time gaps in this sequence. We can use the technique from this SO post to do that. Method of finding gaps in time series data in MySQL?

下一步,我们将其连接到自身.

Next step, we join it to itself.

SELECT two.ser, two.ts, two.temp, 
       TIMESTAMPDIFF(MINUTE, two.ts, one.ts) gap
  FROM (
     /* virtual table */
  ) ONE
  JOIN (
     /* same virtual table */
  ) TWO ON (TWO.ser+ 1 = ONE.ser)

此查询获取系列中的每个项目与其后的项目之间的时间间隔.从概念上讲,这是一件简单的事情,但在 SQL 的 MySQL 版本中却很棘手.这是完整的查询.

This query gets the time gap between each item in the series and the one after it. It's a straightforward thing to do conceptually, but tricky in the MySQL version of SQL. Here's the full query.

SELECT two.ser, two.ts, two.temp, 
       TIMESTAMPDIFF(MINUTE, two.ts, one.ts) gap
      FROM (
 SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s
      ) ONE
      JOIN (
SELECT @sample2:=@sample2+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample2:=0) s
      ) TWO ON (TWO.ser+ 1 = ONE.ser)

这是 sqlfiddle:http://sqlfiddle.com/#!2/d81e2/13/0 请注意,某些间隔的持续时间为 30 分钟.这对于连续读数来说是正常的.有些是60分钟.这也很正常,因为我使用的时间序列缺少一些条目.此结果集中的条目显示了间隙之前的时间和温度.

Here's the sqlfiddle: http://sqlfiddle.com/#!2/d81e2/13/0 Notice that some of the gaps are 30 minutes in duration. That's normal for consecutive readings. Some are 60 minutes. That's also normal, because the time series I'm using has some missing entries. The entries in this result set show the times and temperatures immediately before the gaps.

所以,剩下的就是去除垃圾间隔(30 和 60 分钟),然后按降序排列剩余的间隔.

So, all that's left is to get rid of the junk gaps (30 and 60 minutes) and then order the remaining gaps in descending order.

SELECT two.ts, two.temp, 
       TIMESTAMPDIFF(MINUTE, two.ts, one.ts) gap
      FROM (
 SELECT @sample:=@sample+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample:=0) s
      ) ONE
      JOIN (
SELECT @sample2:=@sample2+1 AS ser, ts, temp
  FROM (
     SELECT ts,temp
       FROM t
      WHERE NOT temp >= 25
      ORDER BY ts
    ) C,
  (SELECT @sample2:=0) s
      ) TWO ON (TWO.ser+ 1 = ONE.ser)
 WHERE TIMESTAMPDIFF(MINUTE, two.ts, one.ts)> 60
 ORDER BY TIMESTAMPDIFF(MINUTE, two.ts, one.ts) DESC

这为温度高于 25 度的每个时间序列给出一行;时间最长的一次.结果集中显示的项目是温度上升前最后一次低于 25 度.SQL小提琴.http://sqlfiddle.com/#!2/d81e2/14/0

This gives one row for each sequence of time where the temperature is above 25 degrees; the longest time first. The item shown in the result set is the last time temperature below 25 before it went up. SQL Fiddle. http://sqlfiddle.com/#!2/d81e2/14/0

好玩吧?

这篇关于检测时间序列中满足特定标准的连续项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆