在查询结果sql中插入多余的行 [英] insert extra rows in query result sql

查看:79
本文介绍了在查询结果sql中插入多余的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个带有不规则时间戳记条目的表,中断"表示没有中断.必须以固定的5分钟间隔插入(关联的数据可以/将为NULL).

Given a table with entries at irregular time stamps, "breaks" must be inserted at regular 5 min intervals ( the data associated can / will be NULL ).

我当时正在考虑获取开始时间,创建一个具有窗口功能并向开始时间增加5分钟间隔的子查询-但我只能考虑使用row_number来增加值.

I was thinking of getting the start time, making a subquery that has a window function and adds 5 min intervals to the start time - but I only could think of using row_number to increment the values.

WITH data as(
select id, data,
cast(date_and_time as double) * 1000 as time_milliseconds
from t1), -- original data

start_times as(
select id, MIN(CAST(date_and_time as double) * 1000) as start_time
from t1
GROUP BY id
), -- first timestamp for each id

boundries as (
SELECT T1.id,(row_number() OVER (PARTITION BY T1.id ORDER BY T1.date_and_time)-1) *300000 + start_times.start_time
as boundry
from T1
INNER JOIN start_times ON start_times.id= T1.id
) -- increment the number of 5 min added on each row and later full join boundries table with original data

但是,这将我限制为原始数据表中ID的行数,并且如果分散了时间戳记,那么行数将无法覆盖需要添加的5分钟间隔.

However this limits me to the number of rows present for an id in the original data table, and if the timestamps are spread out, the number of rows cannot cover the amount of 5 min intervals needed to be added.

样本数据:

initial data:

 |-----------|------------------|------------------|
 |   id      |     value        |    timestamp     |
 |-----------|------------------|------------------|
 |     1     |    3             |    12:00:01.011  | 
 |-----------|------------------|------------------|
 |     1     |    4             |    12:03:30.041  |
 |-----------|------------------|------------------|
 |     1     |    5             |    12:12:20.231  |
 |-----------|------------------|------------------|
 |     1     |    3             |    15:00:00.312  |

data after my query:

 |-----------|------------------|------------------|
 |   id      |     value        | timestamp (UNIX) |
 |-----------|------------------|------------------|
 |     1     |    3             |    12:00:01      | 
 |-----------|------------------|------------------|
 |     1     |    4             |    12:03:30      |
 |-----------|------------------|------------------|
 |     1     |    NULL          |    12:05:01      |  <-- Data from "boundries"
 |-----------|------------------|------------------|
 |     1     |    NULL          |    12:10:01      |  <-- Data from "boundries"
 |-----------|------------------|------------------|
 |     1     |    5             |    12:12:20      |
 |-----------|------------------|------------------|
 |     1     |    NULL          |    12:15:01      |  <-- Data from "boundries"
 |-----------|------------------|------------------|
 |     1     |    NULL          |    12:20:01      |  <-- Data from "boundries"
 |-----------|------------------|------------------|  <-- Jumping directly to 15:00:00 (WRONG! :( need to insert more 5 min breaks here )
 |     1     |    3             |    15:00:00      |  



我当时正在考虑在HIVE中创建一个临时表,并用x行填充该表,该行表示从数据表的开始时间到结束时间的5分钟间隔,但是我找不到任何方法来实现此目的.

I was thinking of creating a temporary table inside HIVE and filling it with x rows representing 5 min intervals from the starttime to the endtime of the data table, but I couldn't find any way of accomplishing that.

使用"for循环"的任何方式?任何建议,将不胜感激.

Any way of using "for loops" ? Any suggestions would be appreciated.

谢谢

推荐答案

您可以尝试计算当前时间戳和下一个时间戳之间的差,除以300得到范围数,生成一串长度为num_ranges的空格,展开为生成行.

You can try calculating the difference between current timestamp and next one, divide 300 to get number of ranges, produce a string of spaces with length = num_ranges, explode to generate rows.

演示:

with your_table as (--initial data example
select stack (3,
1,3 ,'2020-01-01 12:00:01.011', 
1,4 ,'2020-01-01 12:03:30.041',
1,5 ,'2020-01-01 12:20:20.231' 
) as (id ,value ,ts )
)


select id ,value, ts, next_ts,
        diff_sec,num_intervals,
       from_unixtime(unix_timestamp(ts)+h.i*300) new_ts, coalesce(from_unixtime(unix_timestamp(ts)+h.i*300),ts) as calculated_timestamp
from
(
 select id ,value ,ts, next_ts, (unix_timestamp(next_ts)-unix_timestamp(ts))  diff_sec,  
 floor((unix_timestamp(next_ts)-unix_timestamp(ts))/300 --diff in seconds/5 min
                                         ) num_intervals
from
(  
select id ,value ,ts, lead(ts) over(order by ts) next_ts
  from your_table
) s
)s
  lateral view outer posexplode(split(space(cast(s.num_intervals as int)),' ')) h as i,x --this will generate rows

结果:

id  value   ts                      next_ts                 diff_sec    num_intervals   new_ts              calculated_timestamp
1   3       2020-01-01 12:00:01.011 2020-01-01 12:03:30.041 209          0              2020-01-01 12:00:01 2020-01-01 12:00:01
1   4       2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010         3              2020-01-01 12:03:30 2020-01-01 12:03:30
1   4       2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010         3              2020-01-01 12:08:30 2020-01-01 12:08:30
1   4       2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010         3              2020-01-01 12:13:30 2020-01-01 12:13:30
1   4       2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010         3              2020-01-01 12:18:30 2020-01-01 12:18:30
1   5       2020-01-01 12:20:20.231 \N                      \N           \N             \N                  2020-01-01 12:20:20.231

添加了其他行.我将所有中间列留作调试用途.

Additional rows were added. I left all intermediate columns for debugging purposes.

这篇关于在查询结果sql中插入多余的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆