一旦在Teradata中达到阈值,将一列数字分成30组 [英] Sessionize a column of numbers into groups of 30 once a threshold is met in Teradata
问题描述
考虑一个代表事件之间的时间"的列:
Consider a column that represents "time between events":
(5, 40, 3, 6, 0, 9, 0, 4, 5, 18, 2, 4, 3, 2)
我想将它们分组为30个存储桶,但这些存储桶会重置.期望的结果:
I would like to group these into buckets of 30, but buckets that reset. Desired outcome:
(0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
这是因为,当我们累积到30时,我们将重置"并再次开始计数.因此,当5 + 40> 30时,我们下降到零并开始累积加法,直到达到30 ...(3 + 6 + 0 ...),这发生在我们达到第10个元素== 18时.
这可以通过Reduce
函数实现(请参见此答案),但是我不知道如何实现这在Teradata中?就像我需要能够在同一调用中引用相同的OVER(PARTITION BY ...
.
This can be implemented via a Reduce
function (see this answer) but I cannot figure out how to implement this in Teradata? It's like I need to be able to refer to the same OVER(PARTITION BY ...
within the same call.
打破逻辑,这是Excel中的示例:
Breaking down the logic, here's an example in Excel:
其中,B2
具有公式:=IF(B1<30, B1+A2, A2)
,并相应地向下拖动.然后,列C
会检查列B
是否> = 30,并且列D
是列B
的简单累加.
Where, B2
has the formula: =IF(B1<30, B1+A2, A2)
and is dragged down accordingly. Column C
then checks if column B
is >=30, and column D
is a simple cumsum of column B
.
推荐答案
我知道在Teradata中做到这一点的唯一方法是使用递归CTE.因为我很懒,所以我们简化一下,说您要在运行总和大于2时重置. 为此创建并填充一个非常简单的易失表:
Only way I know of to do this in Teradata is to use a recursive CTE. Because I'm lazy, let's simplify this down to say you want to reset when your running sum is greater than 2. Creating and populating a really simple volatile table for this:
CREATE VOLATILE TABLE vt1
(
foo VARCHAR(10)
, counter INTEGER
, bar INTEGER
)
ON COMMIT PRESERVE ROWS;
INSERT INTO vt1 VALUES ('a', 1, '1');
INSERT INTO vt1 VALUES ('a', 2, '2');
INSERT INTO vt1 VALUES ('a', 3, '2');
INSERT INTO vt1 VALUES ('a', 4, '4');
INSERT INTO vt1 VALUES ('a', 5, '1');
INSERT INTO vt1 VALUES ('b', 1, '3');
INSERT INTO vt1 VALUES ('b', 2, '1');
INSERT INTO vt1 VALUES ('b', 3, '1');
INSERT INTO vt1 VALUES ('b', 4, '2');
这是实际的选择:
WITH RECURSIVE cte (foo, counter, bar, rsum) AS
(
SELECT
foo
, counter
, bar
, bar AS rsum
FROM
vt1
QUALIFY ROW_NUMBER() OVER (PARTITION BY foo ORDER BY counter) = 1
UNION ALL
SELECT
t.foo
, t.counter
, t.bar
, CASE WHEN cte.rsum < 3 THEN t.bar + cte.rsum ELSE t.bar END
FROM
vt1 t JOIN cte ON t.foo = cte.foo AND t.counter = cte.counter + 1
)
SELECT
cte.*
, CASE WHEN rsum < 5 THEN 0 ELSE 1 END AS tester
FROM
cte
ORDER BY
foo
, counter
;
最终会给我们的:
╔═════╦═════════╦═════╦══════╦════════╗
║ foo ║ counter ║ bar ║ rsum ║ tester ║
╠═════╬═════════╬═════╬══════╬════════╣
║ a ║ 1 ║ 1 ║ 1 ║ 0 ║
║ a ║ 2 ║ 2 ║ 3 ║ 0 ║
║ a ║ 3 ║ 2 ║ 5 ║ 1 ║
║ a ║ 4 ║ 4 ║ 4 ║ 0 ║
║ a ║ 5 ║ 1 ║ 5 ║ 1 ║
║ b ║ 1 ║ 3 ║ 3 ║ 0 ║
║ b ║ 2 ║ 1 ║ 4 ║ 0 ║
║ b ║ 3 ║ 1 ║ 5 ║ 1 ║
║ b ║ 4 ║ 2 ║ 2 ║ 0 ║
╚═════╩═════════╩═════╩══════╩════════╝
case语句为我们处理重置.
The case statements handles the reset for us.
这有点丑陋,但我从来没有能够以其他任何方式使它工作.
It's kind of ugly, but I've never been able to get this to work any other way.
这篇关于一旦在Teradata中达到阈值,将一列数字分成30组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!