使用 Order By 计算分区中的行数 [英] Count rows in partition with Order By
问题描述
我试图通过编写一些示例查询来理解 postgres 中的 PARTITION BY.我有一个用于运行查询的测试表.
id 整数 |num 整数___________|_____________1 |42 |43 |54 |6
当我运行以下查询时,我得到了预期的输出.
SELECT id, COUNT(id) OVER(PARTITION BY num) from test;身份证 |数数___________|_____________1 |22 |23 |14 |1
但是,当我将 ORDER BY 添加到分区时,
SELECT id, COUNT(id) OVER(PARTITION BY num ORDER BY id) from test;身份证 |数数___________|_____________1 |12 |23 |14 |1
我的理解是 COUNT 是跨分区中的所有行计算的.在这里,我按 num 对行进行了分区.分区中的行数是相同的,有或没有 ORDER BY 子句.为什么输出有差异?
当您将 order by
添加到用作窗口函数的聚合时,聚合会变成运行计数"(或其他任何您使用的聚合).
count(*)
将根据指定的顺序返回直到当前行"的行数.
以下查询显示了与 order by
一起使用的聚合的不同结果.使用 sum()
而不是 count()
更容易看到(在我看来).
with test (id, num, x) as (价值观(1, 4, 1),(2, 4, 1),(3, 5, 2),(4, 6, 2))选择身份证,数,X,count(*) over () 作为 total_rows,count(*) over (order by id) as rows_upto,count(*) over (part by x order by id) as rows_per_x,sum(num) over (partition by x) as total_for_x,sum(num) over (order by id) as sum_upto,sum(num) over (part by x order by id) as sum_for_x_upto从测试;
将导致:
id |数量 |× |total_rows |rows_upto |rows_per_x |total_for_x |sum_upto |sum_for_x_upto---+-----+---+------------+-----------+------------+-------------+----------+---------------1 |4 |1 |4 |1 |1 |8 |4 |42 |4 |1 |4 |2 |2 |8 |8 |83 |5 |2 |4 |3 |1 |11 |13 |54 |6 |2 |4 |4 |2 |11 |19 |11
Postgres 手册中有更多示例>
I was trying to understand PARTITION BY in postgres by writing a few sample queries. I have a test table on which I run my query.
id integer | num integer
___________|_____________
1 | 4
2 | 4
3 | 5
4 | 6
When I run the following query, I get the output as I expected.
SELECT id, COUNT(id) OVER(PARTITION BY num) from test;
id | count
___________|_____________
1 | 2
2 | 2
3 | 1
4 | 1
But, when I add ORDER BY to the partition,
SELECT id, COUNT(id) OVER(PARTITION BY num ORDER BY id) from test;
id | count
___________|_____________
1 | 1
2 | 2
3 | 1
4 | 1
My understanding is that COUNT is computed across all rows that fall into a partition. Here, I have partitioned the rows by num. The number of rows in the partition is the same, with or without an ORDER BY clause. Why is there a difference in the outputs?
When you add an order by
to an aggregate used as a window function that aggregate turns into a "running count" (or whatever aggregate you use).
The count(*)
will return the number of rows up until the "current one" based on the order specified.
The following query shows the different results for aggregates used with an order by
. With sum()
instead of count()
it's a bit easier to see (in my opinion).
with test (id, num, x) as (
values
(1, 4, 1),
(2, 4, 1),
(3, 5, 2),
(4, 6, 2)
)
select id,
num,
x,
count(*) over () as total_rows,
count(*) over (order by id) as rows_upto,
count(*) over (partition by x order by id) as rows_per_x,
sum(num) over (partition by x) as total_for_x,
sum(num) over (order by id) as sum_upto,
sum(num) over (partition by x order by id) as sum_for_x_upto
from test;
will result in:
id | num | x | total_rows | rows_upto | rows_per_x | total_for_x | sum_upto | sum_for_x_upto
---+-----+---+------------+-----------+------------+-------------+----------+---------------
1 | 4 | 1 | 4 | 1 | 1 | 8 | 4 | 4
2 | 4 | 1 | 4 | 2 | 2 | 8 | 8 | 8
3 | 5 | 2 | 4 | 3 | 1 | 11 | 13 | 5
4 | 6 | 2 | 4 | 4 | 2 | 11 | 19 | 11
There are more examples in the Postgres manual
这篇关于使用 Order By 计算分区中的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!