如何使用SQL有效地确定行之间的更改 [英] How to efficiently determine changes between rows using SQL

查看:46
本文介绍了如何使用SQL有效地确定行之间的更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的MySQL表,其中包含从多个传感器读取的数据.本质上,这里有一个时间戳和一个值列.我将省略传感器ID,在此处为其他详细信息编制索引:

I have a very large MySQL table containing data read from a number of sensors. Essentially, there's a time stamp and a value column. I'll omit the sensor id, indexes other details here:

CREATE TABLE `data` (
  `time` datetime NOT NULL,
  `value` float NOT NULL
)

value列很少更改,因此我需要查找发生这些更改的时间点.假设每分钟都有一个值,以下查询恰好返回了我需要的值:

The value column rarely changes, and I need to find the points in time when those changes occur. Suppose there's a value every minute, the following query returns exactly what I need:

SELECT d.*, 
  (SELECT value FROM data WHERE time<d.time ORDER by time DESC limit 1) 
    AS previous_value 
FROM data d 
HAVING d.value<>previous_value OR previous_value IS NULL;

+---------------------+-------+----------------+
| time                | value | previous_value |
+---------------------+-------+----------------+
| 2011-05-23 16:05:00 |     1 |           NULL |
| 2011-05-23 16:09:00 |     2 |              1 |
| 2011-05-23 16:11:00 |   2.5 |              2 |
+---------------------+-------+----------------+

唯一的问题是这效率非常低,主要是由于依赖子查询.使用MySQL 5.1必须提供的工具来对此进行优化的最佳方法是什么?

The only problem is that this is very inefficient, mostly due to the dependent subquery. What would be the best way to optimize this using the tools that MySQL 5.1 has to offer?

最后一个约束是,在将值插入数据表之前不对它们进行排序,并且可能在以后更新它们.这可能会影响任何可能的非规范化策略.

One last constraint is that the values are not ordered before they are inserted into the data table and that they might be updated at a later point. This might affect any possible de-normalization strategies.

推荐答案

您可以尝试-我不保证它的性能会更好,但这是我通常将行与上一个"行相关联的方式:

You might try this - I'm not going to guarantee that it will perform better, but it's my usual way of correlating a row with a "previous" row:

SELECT
    * --TODO, list columns
FROM
    data d
       left join
    data d_prev
       on
           d_prev.time < d.time --TODO - Other key columns?
       left join
    data d_inter
       on
           d_inter.time < d.time and
           d_prev.time < d_inter.time --TODO - Other key columns?
WHERE
    d_inter.time is null AND
    (d_prev.value is null OR d_prev.value <> d.value)

(我认为这是正确的-可以使用一些示例数据来对其进行验证).

(I think this is right - could do with some sample data to validate it).

基本上,此想法是将表与其自身连接,并为每行(在d中)为上一个"行找到候选行(在d_prev中).然后进行进一步的联接,以尝试找到在当前行(在d中)和候选行(在d_prev中)之间的行(在d_inter中).如果我们找不到这样的行(d_inter.time is null),则该候选对象确实是前一行.

Basically, the idea is to join the table to itself, and for each row (in d), find candidate rows (in d_prev) for the "previous" row. Then do a further join, to try to find a row (in d_inter) that exists between the current row (in d) and the candidate row (in d_prev). If we cannot find such a row (d_inter.time is null), then that candidate was indeed the previous row.

这篇关于如何使用SQL有效地确定行之间的更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆