滞后函数获取最后一个不同的值(redshift) [英] lag function to get the last different value(redshift)

查看:124
本文介绍了滞后函数获取最后一个不同的值(redshift)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下示例数据,想要获得所需的o/p,请帮助我一些想法.

I have a sample data as below and wanting to get a desired o/p, please help me with some idea.

我希望第三,第四行的prev_diff_value的o/p为 2015-01-01 00:00:00 ,而不是 2015-01-02 00:00: 00.

I want the o/p of prev_diff_value of the 3rd,4th row to be 2015-01-01 00:00:00 instead of 2015-01-02 00:00:00.

with dat as (
            select 1 as id,'20150101 02:02:50'::timestamp as dt union all
            select 1,'20150101 03:02:50'::timestamp union all
            select 1,'20150101 04:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:51'::timestamp union all
            select 1,'20150103 02:02:50'::timestamp union all
            select 2,'20150101 02:02:50'::timestamp union all
            select 2,'20150101 03:02:50'::timestamp union all
            select 2,'20150101 04:02:50'::timestamp union all
            select 2,'20150102 02:02:50'::timestamp union all
            select 1,'20150104 02:02:50'::timestamp
            )-- select * from dat
   select id , dt , lag(trunc(dt)) over(partition by id order by dt asc) prev_diff_value
   from dat
  order by id,dt desc
O/P : 
   id   dt                    prev_diff_value
   1    2015-01-04 02:02:50   2015-01-03 00:00:00
   1    2015-01-03 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:51   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-01 00:00:00

推荐答案

据我了解,您希望获取ID分区中每个时间戳的先前不同日期.然后,我将lag应用于iddate的唯一组合,并像这样重新加入原始数据集:

As I understand you want to get the previous different date for each timestamp within id partition. I would then apply lag against the unique combination of id and date and join back to the original dataset like this:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
,dat_unique_lag as (
    select *, lag(date) over(partition by id order by date asc) prev_diff_value
    from (
        select distinct id,trunc(dt) as date
        from dat
    )
)
select *
from dat
join dat_unique_lag
using (id)
where trunc(dat.dt)=dat_unique_lag.date
order by id,dt desc;

但是,这不是超级性能.如果数据的性质是同一天的时间戳数量有限,则可以使用如下条件语句来延长滞后时间:

However, this is not super performant. If nature of your data is that you have a limited number of timestamps for the same day you might just extend your lag with a conditional statement like this:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select id, dt,
case 
    when lag(trunc(dt)) over(partition by id order by dt asc)=trunc(dt)
    then case 
        when lag(trunc(dt),2) over(partition by id order by dt asc)=trunc(dt)
        then case
            when lag(trunc(dt),3) over(partition by id order by dt asc)=trunc(dt)
            then lag(trunc(dt),4) over(partition by id order by dt asc)
            else lag(trunc(dt),3) over(partition by id order by dt asc)
            end
        else lag(trunc(dt),2) over(partition by id order by dt asc)
        end
    else lag(trunc(dt)) over(partition by id order by dt asc)
end as prev_diff_value
from dat
order by id,dt desc;

基本上,您查看上一个记录,如果它不适合您,那么您将返回到该记录之前的记录,依此类推,直到找到正确的记录或用尽语句深度为止.在这里看起来一直到第4条记录为止.

Basically, you look at the previous record and if it doesn't suit you then you look back to the record before that one and so on until you find the correct record or run out of your statement depth. Here it looks until the 4th record back.

这篇关于滞后函数获取最后一个不同的值(redshift)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆