如何在连续几天的“连续变化”中向行添加运行计数 [英] How to add a running count to rows in a 'streak' of consecutive days
问题描述
感谢 Mike ,建议您添加create / insert语句。
Thanks to Mike for the suggestion to add the create/insert statements.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');
我想添加一个新列,即当前条纹天数
,因此结果类似于:
I want to add a new column that is 'days in current streak' so the result would look like:
pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1
我一直在尝试使用
- PostgreSQL: find number of consecutive days up until now
- Return rows of the latest 'streak' of data
但是我不知道如何在其他窗口函数中使用 dense_rank()
技巧来获得正确的结果。
but I can't work out how to use the dense_rank()
trick with other window functions to get the right result.
推荐答案
在此表上构建(不使用 SQL关键字日期 作为列名。):
Building on this table (not using the SQL keyword "date" as column name.):
CREATE TABLE tbl(
pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);
查询:
SELECT pid, the_date
, row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM (
SELECT *
, the_date - '2000-01-01'::date
- row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
FROM tbl
) sub
ORDER BY pid, the_date;
从另一个<$ c减去日期
$ c> date 产生整数
。由于您一直在寻找连续的日子,因此每隔一行将增加一个。如果从中减去 row_number()
,则每个<$ c $ g $ c $ c> pid 。这样就可以很容易地计算出每个组的数量。
Subtracting a date
from another date
yields an integer
. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number()
from that, the whole streak ends up in the same group (grp
) per pid
. Then it's simple to deal out number per group.
grp
是用两次减法计算的,这应该是最快的。同样快速的替代方法可能是:
grp
is calculated with two subtractions, which should be fastest. An equally fast alternative could be:
the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp
一次乘法,一次减法。字符串连接和转换更昂贵。用 EXPLAIN ANALYZE
进行测试。
One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE
.
别忘了按 pid $进行分区c $ c>另外在两个步骤中都是这样,否则您会无意间混合应该分开的组。
Don't forget to partition by pid
additionally in both steps, or you'll inadvertently mix groups that should be separated.
使用子查询,因为那样通常比 CTE 更快。
Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.
由于您提到了它,所以没有什么了: dense_rank()
显然在这里不必要。基本 row_number()
做这份工作。
And since you mentioned it: dense_rank()
is obviously not necessary here. Basic row_number()
does the job.
这篇关于如何在连续几天的“连续变化”中向行添加运行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!