在PostgreSQL中按Order By子句进行分区 [英] Partition By with Order By Clause in PostgreSQL

查看:112
本文介绍了在PostgreSQL中按Order By子句进行分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含这些值的表;

I have a table with these values;

user_id ts                  val
uid1    19.05.2019 01:49:50  0
uid1    19.05.2019 01:50:15  0
uid1    19.05.2019 01:50:20  0
uid1    19.05.2019 01:59:50  1
uid1    19.05.2019 02:20:10  1
uid1    19.05.2019 02:20:15  0
uid1    19.05.2019 02:20:19  0
uid1    19.05.2019 02:30:53  1
uid1    19.05.2019 11:10:25  1
uid1    19.05.2019 11:13:40  0
uid1    19.05.2019 11:13:50  0
uid1    19.05.2019 11:20:19  1
uid2    19.05.2019 15:01:44  0
uid2    19.05.2019 15:05:55  0
uid2    19.05.2019 17:19:35  1
uid2    19.05.2019 17:20:01  0
uid2    19.05.2019 17:20:35  0
uid2    19.05.2019 19:15:50  1

当我只查询带有by by子句的表时,结果看起来像这样;

When I query this table with only partition by clause, result seems like this;

查询:选择*,将sum(val)超过(由user_id划分)作为example_table的res;

user_id ts                  val res
uid1    19.05.2019 01:49:50  0  5
uid1    19.05.2019 01:50:15  0  5
uid1    19.05.2019 01:50:20  0  5
uid1    19.05.2019 01:59:50  1  5
uid1    19.05.2019 02:20:10  1  5
uid1    19.05.2019 02:20:15  0  5
uid1    19.05.2019 02:20:19  0  5
uid1    19.05.2019 02:30:53  1  5
uid1    19.05.2019 11:10:25  1  5
uid1    19.05.2019 11:13:40  0  5
uid1    19.05.2019 11:13:50  0  5
uid1    19.05.2019 11:20:19  1  5
uid2    19.05.2019 15:01:44  0  2
uid2    19.05.2019 15:05:55  0  2
uid2    19.05.2019 17:19:35  1  2
uid2    19.05.2019 17:20:01  0  2
uid2    19.05.2019 17:20:35  0  2
uid2    19.05.2019 19:15:50  1  2

在以上结果中, res 列具有每个分区的 val 列的总和值。但是,如果我用分区和顺序查询表,就会得到这些结果;

In the above results, res column has total sum value of the val column for each partition. But, If I'll query table with partition by and order by, I'm getting these results;

查询: select *,sum( val)over(以ts的user_id顺序划分)作为example_table的res;

user_id ts                  val res
uid1    19.05.2019 01:49:50  0  0
uid1    19.05.2019 01:50:15  0  0
uid1    19.05.2019 01:50:20  0  0
uid1    19.05.2019 01:59:50  1  1
uid1    19.05.2019 02:20:10  1  2
uid1    19.05.2019 02:20:15  0  2
uid1    19.05.2019 02:20:19  0  2
uid1    19.05.2019 02:30:53  1  3
uid1    19.05.2019 11:10:25  1  4
uid1    19.05.2019 11:13:40  0  4
uid1    19.05.2019 11:13:50  0  4
uid1    19.05.2019 11:20:19  1  5
uid2    19.05.2019 15:01:44  0  0
uid2    19.05.2019 15:05:55  0  0
uid2    19.05.2019 17:19:35  1  1
uid2    19.05.2019 17:20:01  0  1
uid2    19.05.2019 17:20:35  0  1
uid2    19.05.2019 19:15:50  1  2

但是使用order by子句, res 列具有每个分区的每一行的 value 列的累积总和。

But with order by clause, res column has the cumulative sum of the value column for each row for each partition.

为什么?我听不懂

推荐答案

更新



此行为已记录此处


4.2.8。窗口函数调用



[..]
默认框架选项为 RANGE UNBOUNDED PRECEDING ,即
无边界限制和当前行之间的范围相同。在
ORDER BY 的情况下,此框将设置为分区
从当前行的最后一个 ORDER开始的所有行。 BY 同行。如果没有
ORDER BY ,这意味着分区的所有行都包含在
窗口框架中,因为所有行都成为当前行的同级。 / p>

4.2.8. Window Function Calls

[..] The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.

这意味着:

在没有 frame_clause的情况下 –默认使用 RANGE UNBOUNDED PRECEDING 。其中包括:

In absence of a frame_clauseRANGE UNBOUNDED PRECEDING is used by default. That includes:


  • 所有行都根据 ORDER BY 子句

  • 当前行

  • ORDER BY 列作为当前行

  • All rows "preceding" the current row according to the ORDER BY clause
  • The current row
  • All rows which have the same values in the ORDER BY columns as the current row

在没有 ORDER BY 子句的情况下–假定 ORDER BY NULL (尽管我又在猜测)。因此,框架将包括分区中的所有行,因为 ORDER BY 列中的值是

In absence of an ORDER BY clause – ORDER BY NULL is assumed (though I'm guessing again). Thus the frame will include all rows from the partition, because the values in the ORDER BY column(s) are the same (which is always NULL) in every row.

免责声明:以下内容是猜测,而不是合格的答案。我没有找到任何可以证实我所写内容的文档。同时,我认为当前给出的答案不能正确解释该行为。

Disclaimer: The following is more a guess than a qualified answer. I didn't find any documentation, which can confirm what I write. At the same time I don't think that currently given answers correctly explain the behavior.

结果差异的原因不是直接的ORDER BY子句,因为 a + b + c c + b + a 相同。原因是(这是我的猜测)ORDER BY子句将 frame_clause 隐式定义为

The reason for the diffrence in the results is not directly the ORDER BY clause, since a + b + c is the same as c + b + a. The reason is (and that is my guess) that the ORDER BY clause implicitly defines the frame_clause as

rows between unbounded preceding and current row

尝试以下查询:

select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
    partition by user_id
    order by ts
    rows between unbounded preceding and current row
  ) as res_order_by_unbounded_preceding
, sum(val) over (
    partition by user_id
    -- order by ts
    rows between unbounded preceding and current row
  ) as res_preceding
, sum(val) over (
    partition by user_id
    -- order by ts
    rows between current row and unbounded following
  ) as res_following
, sum(val) over (
    partition by user_id
    order by ts
    rows between unbounded preceding and unbounded following
  ) as res_orderby_preceding_following

from example_table;

db<>小提琴

您将看到,无需ORDER BY子句也可以获取累计和带有ORDER BY子句的完整总和。

You will see, that you can get a cumulative sum without an ORDER BY clause aswell as get a "full" sum with the ORDER BY clause.

这篇关于在PostgreSQL中按Order By子句进行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆