一旦在Teradata中达到阈值,将一列数字分成30组 [英] Sessionize a column of numbers into groups of 30 once a threshold is met in Teradata

查看:281
本文介绍了一旦在Teradata中达到阈值,将一列数字分成30组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个代表事件之间的时间"的列:

Consider a column that represents "time between events":

(5, 40, 3, 6, 0, 9, 0, 4, 5, 18, 2, 4, 3, 2)

我想将它们分组为30个存储桶,但这些存储桶会重置.期望的结果:

I would like to group these into buckets of 30, but buckets that reset. Desired outcome:

(0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)

这是因为,当我们累积到30时,我们将重置"并再次开始计数.因此,当5 + 40> 30时,我们下降到零并开始累积加法,直到达到30 ...(3 + 6 + 0 ...),这发生在我们达到第10个元素== 18时.

这可以通过Reduce函数实现(请参见此答案),但是我不知道如何实现这在Teradata中?就像我需要能够在同一调用中引用相同的OVER(PARTITION BY ....

This can be implemented via a Reduce function (see this answer) but I cannot figure out how to implement this in Teradata? It's like I need to be able to refer to the same OVER(PARTITION BY ... within the same call.

打破逻辑,这是Excel中的示例:

Breaking down the logic, here's an example in Excel:

其中,B2具有公式:=IF(B1<30, B1+A2, A2),并相应地向下拖动.然后,列C会检查列B是否> = 30,并且列D是列B的简单累加.

Where, B2 has the formula: =IF(B1<30, B1+A2, A2) and is dragged down accordingly. Column C then checks if column B is >=30, and column D is a simple cumsum of column B.

推荐答案

我知道在Teradata中做到这一点的唯一方法是使用递归CTE.因为我很懒,所以我们简化一下,说您要在运行总和大于2时重置. 为此创建并填充一个非常简单的易失表:

Only way I know of to do this in Teradata is to use a recursive CTE. Because I'm lazy, let's simplify this down to say you want to reset when your running sum is greater than 2. Creating and populating a really simple volatile table for this:

CREATE VOLATILE TABLE vt1
(
    foo VARCHAR(10)
    , counter INTEGER
    , bar INTEGER
)
ON COMMIT PRESERVE ROWS;

INSERT INTO vt1 VALUES ('a', 1, '1');
INSERT INTO vt1 VALUES ('a', 2, '2');
INSERT INTO vt1 VALUES ('a', 3, '2');
INSERT INTO vt1 VALUES ('a', 4, '4');
INSERT INTO vt1 VALUES ('a', 5, '1');
INSERT INTO vt1 VALUES ('b', 1, '3');
INSERT INTO vt1 VALUES ('b', 2, '1');
INSERT INTO vt1 VALUES ('b', 3, '1');
INSERT INTO vt1 VALUES ('b', 4, '2');

这是实际的选择:

WITH RECURSIVE cte (foo, counter, bar, rsum) AS
(
SELECT
    foo
  , counter
  , bar
  , bar AS rsum
FROM 
    vt1
QUALIFY ROW_NUMBER() OVER (PARTITION BY foo ORDER BY counter) = 1

UNION ALL

SELECT
    t.foo
  , t.counter
  , t.bar
  , CASE WHEN cte.rsum < 3 THEN t.bar + cte.rsum ELSE t.bar END
FROM
    vt1 t JOIN cte ON t.foo = cte.foo AND t.counter = cte.counter + 1
)

SELECT 
    cte.*
  , CASE WHEN rsum < 5 THEN 0 ELSE 1 END AS tester
FROM 
    cte
ORDER BY 
    foo
    , counter
;

最终会给我们的:

╔═════╦═════════╦═════╦══════╦════════╗
║ foo ║ counter ║ bar ║ rsum ║ tester ║
╠═════╬═════════╬═════╬══════╬════════╣
║ a   ║       1 ║   1 ║    1 ║      0 ║
║ a   ║       2 ║   2 ║    3 ║      0 ║
║ a   ║       3 ║   2 ║    5 ║      1 ║
║ a   ║       4 ║   4 ║    4 ║      0 ║
║ a   ║       5 ║   1 ║    5 ║      1 ║
║ b   ║       1 ║   3 ║    3 ║      0 ║
║ b   ║       2 ║   1 ║    4 ║      0 ║
║ b   ║       3 ║   1 ║    5 ║      1 ║
║ b   ║       4 ║   2 ║    2 ║      0 ║
╚═════╩═════════╩═════╩══════╩════════╝

case语句为我们处理重置.

The case statements handles the reset for us.

这有点丑陋,但我从来没有能够以其他任何方式使它工作.

It's kind of ugly, but I've never been able to get this to work any other way.

这篇关于一旦在Teradata中达到阈值,将一列数字分成30组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆