Teradata 分区查询...动态跟踪行 [英] Teradata partitioned query ... following rows dynamically
问题描述
我有一个包含以下列和数据的表格.数据描述了特定的客户活动时段
I have a table with the following columns and data. Data describes certain customer activity periods
cust_id s_date e_date
11111 01.03.2014 31.03.2014
11111 10.04.2014 30.04.2014
11111 01.05.2014 10.05.2014
11111 15.06.2014 31.07.2014
22222 01.04.2014 31.05.2014
22222 01.06.2014 30.06.2014
22222 01.07.2014 15.07.2014
我想写一个查询来给出这个结果:
And I want to write a query which gives this result:
cust_id s_date e_date
11111 01.03.2014 10.05.2014
11111 15.06.2014 31.07.2014
22222 01.04.2014 15.07.2014
查询结果的目的是在客户活动期小于 15 天时将行合并"为一行.我可以处理前 1 行",但如果需要合并 3 行或更多行,则它不起作用.我想不出如何编写此查询.
The query result purpose is to "merge" rows into one row when customer IN-activity period is less than 15 days. I can handle with "1 row preceding" but if needed to merge 3 or more rows then it does not work. I run out of ideas how to write this query.
我在查询前的半"1 行:
My "half" 1 row preceding query:
SELECT cust_id
, start_date as current_period_start_date
, end_date as current_period_end_date
, end_date+15 as current_period_expired_date
, coalesce(
min(current_period_expire_date)
over(partition by cust_id
order by start_date
rows between 1 preceding and 1 preceding)
, cast('1900-01-01' as date)) as previous_period_expire_date
, case
when current_period_start_date <= previous_period_expire_date
then min(current_period_start_date)
over(partition by cust_id
order by start_date
rows between 1 preceding and current row)
else current_period_start_date
end as new_current_period_start_date
FROM MY_DB.my_table
. . .
另外,有没有可能像这样把前面改成动态的方式?
Also, is it possible to change preceding into dynamical way like this?
... over(partition by ... order by ... rows between X preceding and current row)
推荐答案
Gordon 的答案可以修改,因为基本的 LAG 语法很容易重写:
Gordon's answer can be modified as the basic LAG syntax is easily rewritten:
LAG(col, n) OVER (ORDER BY c)
与
MIN(col) OVER (ORDER BY c ROWS BETWEEN n PRECEDING AND n PRECEDING)
可以使用 COALESCE(LAG...., default value) 将可能的默认值作为第三个参数来完成,只有 IGNORE NULLS 选项是一个非常困难的选项.
The possible default value as a 3rd parameter can be done using a COALESCE(LAG...., default value), only the IGNORE NULLS option is a really tough one.
结果:
SELECT cust_id, MIN(s_date) AS s_date, MAX(e_date) AS e_date
FROM (SELECT t.*, SUM(GroupStartFlag) OVER (PARTITION BY cust_id ORDER BY s_date ROWS UNBOUNDED PRECEDING) AS grpid
FROM (SELECT cust_id, s_date, e_date,
(CASE WHEN s_date <= MIN(e_date)
OVER (PARTITION BY cust_id
ORDER BY s_date
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 15
THEN 0
ELSE 1
END) AS GroupStartFlag
FROM vt
) t
) t
GROUP BY cust_id, grpid;
如果您不需要任何其他列(仅 cust_id 和日期),您还可以使用特定的 TD 13.10 表函数来规范化期间.要包括 15 天的差异,您可以简单地减去/增加 15 天:
If you don't need any additional columns (just the cust_id and the dates) you can also utilize a specific TD 13.10 table function for normalizing periods. To include the 15 day difference you might simply subtract/add 15 days:
WITH cte (cust_id, pd)
AS
( SELECT cust_id, PERIOD(s_date-15, e_date) AS pd
FROM vt
)
SELECT cust_id,
BEGIN(pd)+15,
END(pd),
cnt
FROM TABLE (TD_NORMALIZE_OVERLAP_MEET
(NEW VARIANT_TYPE(cte.cust_id)
,cte.pd)
RETURNS (cust_id INTEGER
,pd PERIOD(DATE)
,cnt INTEGER) --optional: number of rows normalized in one result row
HASH BY cust_id
LOCAL ORDER BY cust_id, pd
) AS t;
在 TD 14.10 中还有一个非常好的语法用于规范化周期:
In TD 14.10 there's also a real nice syntax for normalizing periods:
SELECT cust_id, BEGIN (pd)+15, END(pd)
FROM
(
SELECT NORMALIZE
cust_id, PERIOD(s_date-15, e_date) AS pd
FROM vt
) AS dt
顺便说一句,周期定义为包含开始但不包含结束(即对于前一个周期的无间隙周期结束和下一个周期的开始具有相同的值),因此您可能需要将 15 更改为 16 以获得所需的结果.
Btw, periods are defined with an inclusive start but exclusive end (i.e. for a gapless period end of the previous period and start of the next got the same value), so you might have to change 15 to 16 to get the desired result.
这篇关于Teradata 分区查询...动态跟踪行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!