如何计算带蜂巢的0-1序列的时间长度? [英] how to calculate the time length of 0-1 sequence with hive?
问题描述
现在我有一个数据:
$ p $ time(string)id(int)
201801051127 0
201801051130 0
201801051132 0
201801051135 1
201801051141 1
201801051145 0
201801051147 0
它有三个不同的部分,我想计算这三个部分的时间长度,如第一个零序,时间长度为5分钟。如果我使用'0和1组',第一个零序列将与第三个零序列组合,这不是我想要的。我如何用sql计算三个部分的长度?我试过的my-sql代码如下:
SET @id_label:= 0;
SELECT id_label,id,TIMESTAMPDIFF(MINUTE,MIN(DATE1),MAX(DATE1))FROM
(SELECT id,DATE1,id_label FROM(
SELECT id,str_to_date(TIME,'% Y%m%d%H%i')DATE1,
@id_label:= IF(@id = id,@id_label,@ id_label + 1)id_label,
@id:= id
FROM test.t
ORDER BY str_to_date(TIME,'%Y%m%d%h%i')
)a)b
GROUP BY id_label,id;
我不知道如何将其更改为配置单元代码。
- 添加一个指示,指出是否一个行是组中的第一个(标志为1,否则为空)
- 计算行之前的这些标志的数量以知道其组编号
然后您可以按新的组号码进行分组。
Oracle版本(原始问题) h3>
with q1 as(
select to_date(time,'YYYYMMDDHH24MI')time,id,
case id when lag(id)over(order by time)then null else 1 end first_in_group
from t
),q2 as(
select time,id,count(first_in_group)over(by order by time) grp_id
从q1
选择min(id)id,(max(time) - min(time))* 24 * 60分钟
from q2
group by grp_id
by grp_id
with q1 as(
select to_date(time,'YYYYMMDDHH24MI')time,id,
case id when lag(id)over(order by time)then null else 1 end first_in_group
from t
),q2 as(
select time,id,count(first_in_group)over(by order by time) grp_id
从q1
选择min(id)id,(max(time) - min(time))* 24 * 60分钟
from q2
group by grp_id
by grp_id
Hive版本
不同的数据库引擎使用不同的函数来处理日期/时间值,所以使用Hive的 unix_timestamp
并处理它返回的秒数:
with q1 as(
select unix_timestamp(time ,'yyyyMMddHHmm')/ 60 time,id,
case id when lag(id)over(by order by time)then null else 1 end first_in_group
from t
),q2 as(
选择时间,id,count(first_in_group)结束(按时间排序)grp_id
从q1
)
选择min(id)id,max(time) - min(time)分钟
来自q2
组由grp_id
按grp_id排序
Now I have a data like:
time(string) id(int)
201801051127 0
201801051130 0
201801051132 0
201801051135 1
201801051141 1
201801051145 0
201801051147 0
It has three different parts, and I want to calculate the time length of these three parts, such as the first zero sequence, the time length is 5 minutes. If I use 'group by 0 and 1', the first zero sequence would combine with the third zero sequence, which is not what I want. How I calculate the three parts' length with sql? My tried my-sql code is as follows:
SET @id_label:=0;
SELECT id_label,id,TIMESTAMPDIFF(MINUTE,MIN(DATE1),MAX(DATE1)) FROM
(SELECT id, DATE1, id_label FROM (
SELECT id, str_to_date ( TIME,'%Y%m%d%H%i' ) DATE1,
@id_label := IF(@id = id, @id_label, @id_label+1) id_label,
@id := id
FROM test.t
ORDER BY str_to_date ( TIME,'%Y%m%d%h%i' )
) a)b
GROUP BY id_label,id;
I don't know how to change it into hive code.
I would suggest some transformations:
- add an indication whether a row is the first one in its group (flag as 1, or null otherwise)
- count the number of such flags that precede a row to know its group number
Then you can just group by that new group number.
Oracle version (original question)
with q1 as (
select to_date(time, 'YYYYMMDDHH24MI') time, id,
case id when lag(id) over(order by time) then null else 1 end first_in_group
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from q1
)
select min(id) id, (max(time) - min(time)) * 24 * 60 minutes
from q2
group by grp_id
order by grp_id
Hive version
Different database engines use different functions to deal with date/time values, so use Hive's unix_timestamp
and deal with the number of seconds it returns:
with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id,
case id when lag(id) over(order by time) then null else 1 end first_in_group
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from q1
)
select min(id) id, max(time) - min(time) minutes
from q2
group by grp_id
order by grp_id
这篇关于如何计算带蜂巢的0-1序列的时间长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!