如何计算带蜂巢的0-1序列的时间长度? [英] how to calculate the time length of 0-1 sequence with hive?

查看:160
本文介绍了如何计算带蜂巢的0-1序列的时间长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我有一个数据:

$ p $ time(string)id(int)

201801051127 0

201801051130 0

201801051132 0

201801051135 1

201801051141 1

201801051145 0

201801051147 0

它有三个不同的部分,我想计算这三个部分的时间长度,如第一个零序,时间长度为5分钟。如果我使用'0和1组',第一个零序列将与第三个零序列组合,这不是我想要的。我如何用sql计算三个部分的长度?我试过的my-sql代码如下:

  SET @id_label:= 0; 
SELECT id_label,id,TIMESTAMPDIFF(MINUTE,MIN(DATE1),MAX(DATE1))FROM
(SELECT id,DATE1,id_label FROM(
SELECT id,str_to_date(TIME,'% Y%m%d%H%i')DATE1,
@id_label:= IF(@id = id,@id_label,@ id_label + 1)id_label,
@id:= id
FROM test.t
ORDER BY str_to_date(TIME,'%Y%m%d%h%i')
)a)b
GROUP BY id_label,id;

我不知道如何将其更改为配置单元代码。




  • 添加一个指示,指出是否一个行是组中的第一个(标志为1,否则为空)

  • 计算行之前的这些标志的数量以知道其组编号


然后您可以按新的组号码进行分组。



Oracle版本(原始问题) h3>

  with q1 as(
select to_date(time,'YYYYMMDDHH24MI')time,id,
case id when lag(id)over(order by time)then null else 1 end first_in_group
from t
),q2 as(
select time,id,count(first_in_group)over(by order by time) grp_id
从q1

选择min(id)id,(max(time) - min(time))* 24 * 60分钟
from q2
group by grp_id
by grp_id

SQL小提琴



Hive版本



不同的数据库引擎使用不同的函数来处理日期/时间值,所以使用Hive的 unix_timestamp 并处理它返回的秒数:

  with q1 as(
select unix_timestamp(time ,'yyyyMMddHHmm')/ 60 time,id,
case id when lag(id)over(by order by time)then null else 1 end first_in_group
from t
),q2 as(
选择时间,id,count(first_in_group)结束(按时间排序)grp_id
从q1

选择min(id)id,max(time) - min(time)分钟
来自q2
组由grp_id
按grp_id排序


Now I have a data like:

time(string) id(int)

201801051127 0

201801051130 0

201801051132 0

201801051135 1

201801051141 1

201801051145 0

201801051147 0

It has three different parts, and I want to calculate the time length of these three parts, such as the first zero sequence, the time length is 5 minutes. If I use 'group by 0 and 1', the first zero sequence would combine with the third zero sequence, which is not what I want. How I calculate the three parts' length with sql? My tried my-sql code is as follows:

SET @id_label:=0;
SELECT id_label,id,TIMESTAMPDIFF(MINUTE,MIN(DATE1),MAX(DATE1)) FROM
(SELECT id, DATE1, id_label FROM (
SELECT id, str_to_date ( TIME,'%Y%m%d%H%i' ) DATE1,
@id_label := IF(@id = id, @id_label, @id_label+1)  id_label,
@id := id
FROM test.t
ORDER BY str_to_date ( TIME,'%Y%m%d%h%i' )
) a)b
GROUP BY id_label,id;

I don't know how to change it into hive code.

解决方案

I would suggest some transformations:

  • add an indication whether a row is the first one in its group (flag as 1, or null otherwise)
  • count the number of such flags that precede a row to know its group number

Then you can just group by that new group number.

Oracle version (original question)

with q1 as (
    select to_date(time, 'YYYYMMDDHH24MI') time, id, 
           case id when lag(id) over(order by time) then null else 1 end first_in_group 
    from t
), q2 as (
    select time, id, count(first_in_group) over (order by time) grp_id
    from   q1
)
select   min(id) id, (max(time) - min(time)) * 24 * 60 minutes
from     q2
group by grp_id
order by grp_id

SQL fiddle

Hive version

Different database engines use different functions to deal with date/time values, so use Hive's unix_timestamp and deal with the number of seconds it returns:

with q1 as (
    select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id, 
           case id when lag(id) over(order by time) then null else 1 end first_in_group 
    from t
), q2 as (
    select time, id, count(first_in_group) over (order by time) grp_id
    from   q1
)
select   min(id) id, max(time) - min(time) minutes
from     q2
group by grp_id
order by grp_id

这篇关于如何计算带蜂巢的0-1序列的时间长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆