根据行顺序进行分组 [英] Grouping based on sequence of rows

查看:94
本文介绍了根据行顺序进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张订单表,其中一列表示是买还是卖,通常按时间戳对行进行排序。我想做的是对连续购买的群组以及他们的出售进行操作。例如 B B S B S B B S-> (BBS)(BS)(BBS)

I have a table of orders with a column denoting whether it's a buy or a sell, with the rows typically ordered by timestamp. What I'd like to do is operate on groups of consecutive buys, plus their sell. e.g. B B S B S B B S -> (B B S) (B S) (B B S)

示例:

order_action |      timestamp      
-------------+---------------------
buy          | 2013-10-03 13:03:02
buy          | 2013-10-08 13:03:02
sell         | 2013-10-10 15:58:02
buy          | 2013-11-01 09:30:02
buy          | 2013-11-01 14:03:02
sell         | 2013-11-07 10:34:02
buy          | 2013-12-03 15:46:02
sell         | 2013-12-09 16:00:03
buy          | 2013-12-11 13:02:02
sell         | 2013-12-18 15:59:03

最后我将运行聚合功能(分组是为了让我可以根据其卖出订单排除整个分组),所以 GROUP BY 或分区的窗口似乎是正确的方法,但是我可以

I'll be running an aggregation function in the end (the groups are so that I can exclude an entire group based on its sell order), so GROUP BY or partitioned windows seemed like the right way to go, but I can't figure out how to get this specific grouping.

推荐答案

这可能 出奇的简单 ,其中 count()作为窗口聚合函数:

This can be surprisingly simple with count() as window aggregate function:

SELECT *
      ,count(order_action = 'sell' OR NULL) OVER (ORDER BY ts DESC) AS grp
FROM   orders;

使用 ts 代替 timestamp 作为列名。避免使用保留的单词作为标识符。

Using ts instead as timestamp as column name. Avoid reserved words as identifiers.

count()仅计算非空值。表达式 order_action ='sell'或NULL 的结果是 TRUE 表示'sell'和 NULL 否则。 count()返回带有默认框架定义,从框架的开始(在这种情况下为整个表格)一直到当前行的(最后一个对等)。销售组的运行计数按要求对您的行进行分组。

我要在 OVER 子句中命令降序结束于尾随的卖出,而不是领先的卖出。这导致组号降序。但这没关系,您只需要组号。

重复的时间戳将是一个问题(无论如何!)。

count() only counts non-null values. The expression order_action = 'sell' OR NULL results in TRUE for 'sell' and NULL otherwise. count() returns a running count with the default frame definition from the start of the frame (the whole table in this case) up to the (last peer of) the current row. The running count of sells groups your rows as requested.
I am ordering descending in the OVER clause to let each group end at a trailing "sell", not a leading "sell". This results in descending group numbers. But that should not matter, you just needed group numbers.
Duplicate timestamps would be a problem (in any case!).

一种方法升序组号:使用自定义的 FRAME 窗口功能的定义

One way for ascending group numbers: use a custom FRAME definition for the window function:

SELECT *
      ,count(order_action = 'sell' OR NULL)
       OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM   orders;

SQL小提琴 演示了这两者。

SQL Fiddle demonstrating both.

这篇关于根据行顺序进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆