大约在22:00左右的时间间隔分割2行 [英] Split TIME interval around 22:00 time in 2 rows
问题描述
我有这张表
+ ----- + ----- + ------ ------ + ---------- + --- +
|行| id | start_time | end_time | |
+ ----- + ----- + ------------ + ---------- + --- +
| 1 | foo | 18:00:00 | 22:00:00 | |
| 2 | bar | 19:00:00 | 23:00:00 | |
| 3 | baz | 08:00:00 | 11:00:00 | |
| 4 | qux | 05:30:00 | 07:30:00 | |
+ ----- + ----- + ------------ + ---------- + --- +
可以用这个生成
WITH TABLE AS(
SELECTfooAS id,TIME(18,0,0)AS start_time,TIME(22,0,0)AS end_time
UNION ALL
SELECTbar,TIME(19,0,0),TIME(23,0,0)
UNION ALL
SELECTbaz,TIME(08,0,0),TIME(11, 0,0)
UNION ALL
SELECTqux,TIME(05,30,0),TIME(07,30,0)
)
SELECT * FROM TABLE
如果22:00小时在时间间隔内,则该行需要被分割对于给定的输入,输出应该是:
+ ----- + ----- + ------------ + ---------- + --- +
|行| id | start_time | end_time | |
+ ----- + ----- + ------------ + ---------- + --- +
| 1 | foo | 18:00:00 | 22:00:00 | |
| 2 | bar | 19:00:00 | 22:00:00 | |
| 3 | bar | 22:00:00 | 23:00:00 | |
| 4 | baz | 08:00:00 | 11:00:00 | |
| 5 | qux | 05:30:00 | 06:00:00 | |
| 6 | qux | 06:00:00 | 07:30:00 | |
+ ----- + ----- + ------------ + ---------- + --- +
请注意,在分割行的开始和结束处均使用22:00小时。
另外,如果小时为06:00,则需要复制相同的确切行为
这种行为可以直接使用BigQuery吗?下面是针对BigQuery标准SQL的内容,它提供了非常通用的解决方案,您可以根据需要使用任意数量的断点他们在中断
表达式
$ b
#standardSQL
WITH`project.dataset.table` AS(
SELECTfooAS id,TIME(18,0,0)as start_time,TIME(22,0 ,0)AS end_time UNION ALL
SELECTbar,TIME(19,0,0),TIME(23,0,0)UNION ALL
SELECTbaz,TIME(08,0, 0),TIME(11,0,0)UNION ALL
SELECTqux,TIME(05,30,0),TIME(07,30,0)UNION ALL
SELECTxxx, TIME(05,45,0),TIME(23 ,30,0)
),打破AS(
SELECT break FROM UNNEST([TIME(6,0,0),TIME(10,0,0),TIME(22,0,0) ])break
),temp AS(
SELECT id,start_time,end_time,
ARRAY_AGG(break ORDER BY break)break
FROM`project.dataset.table` CROSS JOIN break
WHERE break> start_time和break< end_time
GROUP BY id,start_time,end_time
SELECT ID,点start_time,
IFNULL(next_point,point)end_time $ b $ FROM(
SELECT id ,point,
LEAD(点)OVER(PARTITION BY ID ORDER BY点)next_point
FROM temp,UNNEST(ARRAY_CONCAT([start_time],break,[end_time]))point
WHERE NOT next_point IS NULL
- ORDER BY id,point
<在上面的例子中,我设置了三个断点 - 06:00,10:00和22:00
和初始间隔为
行号start_time end_time
1 bar 19:00:00 23:00:00
2 baz 08: 00:00 11:00:00
3 foo 18:00:00 22:00:00
4 qux 05:30:00 07:30:00
5 xxx 05:45: 00 23:30:00
结果为:
行标识start_time end_time
1 bar 19:00:00 22:00:00
2 bar 22:00:00 23:00:00
3 baz 08:00:00 10:00:00
4 baz 10:00:00 11:00:00
5 qux 05:30:00 06:00:00
6 qux 06:00:00 07:30:00
7 xxx 05:45:00 06:00:00
8 xxx 06:00:00 10:00:00
9 xxx 10:00:00 22:00:00
10 xxx 22:00:00 23:30:00
I have this table
+-----+-----+------------+----------+---+
| Row | id | start_time | end_time | |
+-----+-----+------------+----------+---+
| 1 | foo | 18:00:00 | 22:00:00 | |
| 2 | bar | 19:00:00 | 23:00:00 | |
| 3 | baz | 08:00:00 | 11:00:00 | |
| 4 | qux | 05:30:00 | 07:30:00 | |
+-----+-----+------------+----------+---+
which can be generated with this
WITH TABLE AS (
SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time
UNION ALL
SELECT "bar", TIME(19,0,0), TIME(23,0,0)
UNION ALL
SELECT "baz", TIME(08,0,0), TIME(11,0,0)
UNION ALL
SELECT "qux", TIME(05,30,0), TIME(07,30,0)
)
SELECT * FROM TABLE
If the 22:00 hour is inside the time interval, the row need to be splitted around the hour itself.
For the given input, the output should be:
+-----+-----+------------+----------+---+
| Row | id | start_time | end_time | |
+-----+-----+------------+----------+---+
| 1 | foo | 18:00:00 | 22:00:00 | |
| 2 | bar | 19:00:00 | 22:00:00 | |
| 3 | bar | 22:00:00 | 23:00:00 | |
| 4 | baz | 08:00:00 | 11:00:00 | |
| 5 | qux | 05:30:00 | 06:00:00 | |
| 6 | qux | 06:00:00 | 07:30:00 | |
+-----+-----+------------+----------+---+
Please note that 22:00 hour is used on both start and end of the splitted row.
Also, the same exact behaviour need to be replicated if the hour is 06:00
Can this behaviour be made using directly BigQuery?
Below is for BigQuery Standard SQL and provides quite generic solution where you can have any number of break points as you need by "enlisting" them in breaks
expressions
#standardSQL
WITH `project.dataset.table` AS (
SELECT "foo" AS id, TIME(18,0,0) AS start_time, TIME(22,0,0) AS end_time UNION ALL
SELECT "bar", TIME(19,0,0), TIME(23,0,0) UNION ALL
SELECT "baz", TIME(08,0,0), TIME(11,0,0) UNION ALL
SELECT "qux", TIME(05,30,0), TIME(07,30,0) UNION ALL
SELECT "xxx", TIME(05,45,0), TIME(23,30,0)
), breaks AS (
SELECT break FROM UNNEST([TIME(6,0,0), TIME(10,0,0), TIME(22,0,0)]) break
), temp AS (
SELECT id, start_time, end_time,
ARRAY_AGG(break ORDER BY break) break
FROM `project.dataset.table` CROSS JOIN breaks
WHERE break > start_time AND break < end_time
GROUP BY id, start_time, end_time
)
SELECT id, point start_time,
IFNULL(next_point, point) end_time
FROM (
SELECT id, point,
LEAD(point) OVER(PARTITION BY id ORDER BY point) next_point
FROM temp, UNNEST(ARRAY_CONCAT([start_time], break, [end_time])) point
)
WHERE NOT next_point IS NULL
-- ORDER BY id, point
in above example I set three break points - 06:00, 10:00 and 22:00
and initial intervals as
Row id start_time end_time
1 bar 19:00:00 23:00:00
2 baz 08:00:00 11:00:00
3 foo 18:00:00 22:00:00
4 qux 05:30:00 07:30:00
5 xxx 05:45:00 23:30:00
and result is:
Row id start_time end_time
1 bar 19:00:00 22:00:00
2 bar 22:00:00 23:00:00
3 baz 08:00:00 10:00:00
4 baz 10:00:00 11:00:00
5 qux 05:30:00 06:00:00
6 qux 06:00:00 07:30:00
7 xxx 05:45:00 06:00:00
8 xxx 06:00:00 10:00:00
9 xxx 10:00:00 22:00:00
10 xxx 22:00:00 23:30:00
这篇关于大约在22:00左右的时间间隔分割2行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!