Teradata 分区查询...动态跟踪行 [英] Teradata partitioned query ... following rows dynamically

查看:49
本文介绍了Teradata 分区查询...动态跟踪行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下列和数据的表格.数据描述了特定的客户活动时段

I have a table with the following columns and data. Data describes certain customer activity periods

cust_id    s_date       e_date
11111    01.03.2014   31.03.2014
11111    10.04.2014   30.04.2014
11111    01.05.2014   10.05.2014
11111    15.06.2014   31.07.2014
22222    01.04.2014   31.05.2014
22222    01.06.2014   30.06.2014
22222    01.07.2014   15.07.2014

我想写一个查询来给出这个结果:

And I want to write a query which gives this result:

cust_id    s_date       e_date
11111    01.03.2014   10.05.2014
11111    15.06.2014   31.07.2014
22222    01.04.2014   15.07.2014

查询结果的目的是在客户活动期小于 15 天时将行合并"为一行.我可以处理前 1 行",但如果需要合并 3 行或更多行,则它不起作用.我想不出如何编写此查询.

The query result purpose is to "merge" rows into one row when customer IN-activity period is less than 15 days. I can handle with "1 row preceding" but if needed to merge 3 or more rows then it does not work. I run out of ideas how to write this query.

我在查询前的半"1 行:

My "half" 1 row preceding query:

SELECT cust_id
     , start_date     as current_period_start_date
     , end_date       as current_period_end_date
     , end_date+15    as current_period_expired_date
     , coalesce(
            min(current_period_expire_date)
           over(partition by cust_id
                    order by start_date
                     rows between 1 preceding and 1 preceding)
               , cast('1900-01-01' as date)) as previous_period_expire_date
     , case 
         when current_period_start_date <= previous_period_expire_date
         then min(current_period_start_date)
             over(partition by cust_id
                      order by start_date
                       rows between 1 preceding and current row)
         else current_period_start_date
       end as new_current_period_start_date

  FROM MY_DB.my_table
     . . .

另外,有没有可能像这样把前面改成动态的方式?

Also, is it possible to change preceding into dynamical way like this?

... over(partition by ... order by ... rows between X preceding and current row)

推荐答案

Gordon 的答案可以修改,因为基本的 LAG 语法很容易重写:

Gordon's answer can be modified as the basic LAG syntax is easily rewritten:

LAG(col, n) OVER (ORDER BY c) 

MIN(col) OVER (ORDER BY c ROWS BETWEEN n PRECEDING AND n PRECEDING)

可以使用 COALESCE(LAG...., default value) 将可能的默认值作为第三个参数来完成,只有 IGNORE NULLS 选项是一个非常困难的选项.

The possible default value as a 3rd parameter can be done using a COALESCE(LAG...., default value), only the IGNORE NULLS option is a really tough one.

结果:

SELECT cust_id, MIN(s_date) AS s_date, MAX(e_date) AS e_date
FROM (SELECT t.*, SUM(GroupStartFlag) OVER (PARTITION BY cust_id ORDER BY s_date ROWS UNBOUNDED PRECEDING) AS grpid
      FROM (SELECT cust_id, s_date, e_date,
                   (CASE WHEN s_date <= MIN(e_date) 
                                        OVER (PARTITION BY cust_id 
                                              ORDER BY s_date
                                              ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 15
                         THEN 0
                         ELSE 1
                    END) AS GroupStartFlag
            FROM  vt
           ) t
     ) t
GROUP BY cust_id, grpid;

如果您不需要任何其他列(仅 cust_id 和日期),您还可以使用特定的 TD 13.10 表函数来规范化期间.要包括 15 天的差异,您可以简单地减去/增加 15 天:

If you don't need any additional columns (just the cust_id and the dates) you can also utilize a specific TD 13.10 table function for normalizing periods. To include the 15 day difference you might simply subtract/add 15 days:

WITH cte (cust_id, pd)
AS 
 ( SELECT cust_id, PERIOD(s_date-15, e_date) AS pd
   FROM vt
 )
SELECT cust_id,
   BEGIN(pd)+15,
   END(pd),
   cnt
FROM TABLE (TD_NORMALIZE_OVERLAP_MEET
            (NEW VARIANT_TYPE(cte.cust_id)
                ,cte.pd)
        RETURNS (cust_id INTEGER
                ,pd PERIOD(DATE)
                ,cnt INTEGER) --optional: number of rows normalized in one result row
        HASH BY cust_id
        LOCAL ORDER BY cust_id, pd
        ) AS t;

在 TD 14.10 中还有一个非常好的语法用于规范化周期:

In TD 14.10 there's also a real nice syntax for normalizing periods:

SELECT cust_id, BEGIN (pd)+15, END(pd) 
FROM
 (
   SELECT NORMALIZE
      cust_id, PERIOD(s_date-15, e_date) AS pd
   FROM vt
 ) AS dt

顺便说一句,周期定义为包含开始但不包含结束(即对于前一个周期的无间隙周期结束和下一个周期的开始具有相同的值),因此您可能需要将 15 更改为 16 以获得所需的结果.

Btw, periods are defined with an inclusive start but exclusive end (i.e. for a gapless period end of the previous period and start of the next got the same value), so you might have to change 15 to 16 to get the desired result.

这篇关于Teradata 分区查询...动态跟踪行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆