如何使用SQL查找缺少的数据行? [英] How to find missing data rows using SQL?

查看:192
本文介绍了如何使用SQL查找缺少的数据行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题

我有一个MySQL数据库,按时间顺序存储大量的气象数据10分钟)。不幸的是,有几个停电,因此某些行丢失。我最近设法从气象站获取了某些备份文件,现在我想使用这些来填充缺少的数据。

I got a MySQL database that stores a great amount of meteorological data in chronological order (New data are inserted every 10 min). Unfortunately there have been several blackouts and hence certain rows are missing. I recently managed to obtain certain backup-files from the weather station and now I want to use these to fill in the missing data.

这样的DB ist结构):

The DB ist structures like this (example):

date*            the data    
2/10/2009 10:00  ...
2/10/2009 10:10  ...
( Missing data!)
2/10/2009 10:40  ...
2/10/2009 10:50  ...
2/10/2009 11:00  ...
...

* = datatime - 类型,主键

*=datatime-type, primary key

我的想法

由于备份和数据库位于不同的计算机上,并且流量很慢,我想到创建一个MySQL查询,当运行时,将返回所有缺少的日期在指定的时间范围内。

Since backup and database are located on different computers and traffic is quite slow, I thought of creating a MySQL-query that, when run, will return a list of all missing dates in a specified range of time. I could then extract these dates from the backup and insert them to the database.

问题

如何写这样的查询?我没有创建任何辅助表的权限。是否可以在指定的时间间隔中制定所有所需日期的虚拟表,然后在 JOIN 中使用它?或者是否有完全不同的命题来解决我的问题?

How to write such a query? I don't have the permission to create any auxilary table. Is it possible to formulate a "virtual table" of all required dates in the specified interval and then use it in a JOIN? Or are there entirely different propositions for solving my problem?

编辑
是的,时间戳始终为上面显示的形式(总是10分钟),除了一些只是缺少。

Edit: Yes, the timestamps are consistently in the form shown above (always 10 minutes), except that some are just missing.

好吧,临时表呢?是否有一种优雅的方式自动填充他们的时间范围?如果两个脚本尝试同时运行,这会导致表的问题吗?

Okay, what about the temporary tables? Is there an elegant way of populating them with the time-range automatically? What if two scripts try to run simultaneously, does this cause problems with the table?

推荐答案

select t1.ts as hival, t2.ts as loval
from metdata t1, metdata t2
where t2.ts = (select max(ts) from metdata t3
where t3.ts < t1.ts)
and not timediff(t1.ts, t2.ts) = '00:10:00'

此查询将返回可用于选择缺失数据的对偶。缺少的数据将在查询返回的每个联赛的hival和loval之间有一个时间戳。

This query will return couplets you can use to select the missing data. The missing data will have a timestamp between hival and loval for each couplet returned by the query.

EDIT - thx for check,Craig

EDIT - thx for checking, Craig

获取丢失的时间戳 - 这个SQL有点难以阅读,所以我会打破了一点。首先,我们需要一种以10分钟间隔计算给定低值和高值之间的一系列时间戳值的方法。当你不能创建表时,这样做的一种方法是基于以下sql,它创建一个结果集从0到9的所有数字。

getting the missing timestamps - this SQL gets a bit harder to read, so I'll break it up a bit. First, we need a way to calculate a series of timestamp values between a given low value and a high value in 10 minute intervals. A way of doing this when you can't create tables is based on the following sql, which creates as a resultset all of the digits from 0 to 9.

select d1.* from 
(select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d1

...现在通过将此表与自身的副本组合几次,就意味着我们可以动态生成指定长度的列表



...now by combining this table with a copy of itself a couple of times means we can dynamically generate a list of a specified length

select curdate() + 
INTERVAL  (d1.digit * 100 + d2.digit * 10 + d3.digit) * 10 MINUTE 
as date 
from (select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d1
join
(select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d2
join
(select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d3
where (d1.digit * 100 + d2.digit * 10 + d3.digit) between 1 and 42
order by 1

...现在这个sql正在接近我们所需要的。它有2个输入变量:

... now this piece of sql is getting close to what we need. It has 2 input variables:


  1. 开始时间戳(在示例中我使用
    curdate和

  2. 迭代次数 - 其中
    子句在
    中指定42次迭代示例,最多与3 x个数字表是1000个间隔

...这意味着我们可以使用原始sql驱动上面的示例,为每个hval lowval对生成一系列时间戳。与我抱着,这个sql现在有点长了...

... which means we can use the original sql to drive the example from above to generate a series of timestamps for each hival lowval pair. Bear with me, this sql is a bit long now...

select daterange.loval + INTERVAL  (d1.digit * 100 + d2.digit * 10 + d3.digit) * 10 MINUTE as date 
from 
(select t1.ts as hival, t2.ts as loval
from metdata t1, metdata t2
where t2.ts = (select max(ts) from metdata t3
where t3.ts < t1.ts)
and not timediff(t1.ts, t2.ts) = '00:10:00'
) as daterange
join
(select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d1
join
(select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d2
join
(select 1 as digit
union select 2 
union select 3 
union select 4 
union select 5 
union select 6 
union select 7 
union select 8 
union select 9 
union select 0 
) as d3
where (d1.digit * 100 + d2.digit * 10 + d3.digit) between 1 and
 round((time_to_sec(timediff(hival, loval))-600) /600)
order by 1

... now有一点史诗sql

注意:使用数字表3次给出的最大差距将覆盖有点超过6天

...now there's a bit of epic sql
NOTE : using the digits table 3 times gives a maximum gap it will cover of a bit over 6 days

这篇关于如何使用SQL查找缺少的数据行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆