PostgreSQL按分组获得相对平均 [英] PostgreSQL get relative average with group by

查看:583
本文介绍了PostgreSQL按分组获得相对平均的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张下表。这些行是按特定顺序排列的。

I have a table as follows. The rows are in a specific order.

id    |      value
------+---------------------
 1    |        2
 1    |        4     
 1    |        3
 2    |        2
 2    |        2
 2    |        5

我想按 id列对行进行分组并获取显示的平均值在每个列中,以该列的先前值表示(如以下示例在方括号中所述)

I would want to group the rows by the column 'id' and get the average of value displayed in each column in terms of the previous values of the column (As explained in the following example within brackets)

id    |      value  |    RelativeAverage    
------+-------------+--------------------
 1    |        2    |        (2/1) = 2
 1    |        4    |        (2+4 /2) = 3
 1    |        3    |        (2+4+3 / 3) = 3
 2    |        2    |        (2/1) = 2
 2    |        2    |        (2+2 / 2) = 2
 2    |        5    |        (2+2+5 / 3) = 9

有没有一种方法可以实现这一目标?

Is there an approach with which I can achieve this?

预先感谢

推荐答案

错误的查询:

select 
  id, value, 

  sum(value) over(arrangement), rank() over(arrangement),

  sum(value) over(arrangement)::numeric / rank() over(arrangement) 
  as relative_average
from tbl
window arrangement as (partition by id order by id);

输出(错误):

| id | value | sum | rank | relative_average |
|----|-------|-----|------|------------------|
|  1 |     2 |   9 |    1 |                9 |
|  1 |     4 |   9 |    1 |                9 |
|  1 |     3 |   9 |    1 |                9 |
|  2 |     1 |   8 |    1 |                8 |
|  2 |     2 |   8 |    1 |                8 |
|  2 |     5 |   8 |    1 |                8 |

您需要正确排序的内容,以便总和和排名能够正确地按照您的实际安排数据。您可以使用表格行的隐藏的 ctid 字段,但这是Postgres特定的

You need something that sorts correctly in order for sum and rank to work properly on your actual arrangement of your data. You can use table row's hidden ctid field, but that is Postgres-specific

正确的查询:

select 
    id, value, 

    sum(value) over(arrangement), rank() over(arrangement),

    sum(value) over(arrangement)::numeric / rank() over(arrangement) 
    as relative_average
from tbl
window arrangement as (partition by id order by tbl.ctid);

输出(正确):

| id | value | sum | rank |   relative_average |
|----|-------|-----|------|--------------------|
|  1 |     2 |   2 |    1 |                  2 |
|  1 |     4 |   6 |    2 |                  3 |
|  1 |     3 |   9 |    3 |                  3 |
|  2 |     1 |   1 |    1 |                  1 |
|  2 |     2 |   3 |    2 |                1.5 |
|  2 |     5 |   8 |    3 | 2.6666666666666665 |

最好的方法是引入一个串行主键,这样总可以可以根据数据的实际排列 sum over())。

Best way is to introduce a serial primary key, so doing a running-total(sum over()) based on actual arrangement of your data could be achieved.

CREATE TABLE tbl
    (ordered_pk serial primary key, "id" int, "value" int)
;

INSERT INTO tbl
    ("id", "value")
VALUES
    (1, 2),
    (1, 4),
    (1, 3),
    (2, 1),
    (2, 2),
    (2, 5)
;

正确的查询:

select 
    id, value, 

    sum(value) over(arrangement), rank() over(arrangement),

    sum(value) over(arrangement)::numeric / rank() over(arrangement) 
    as relative_average
from tbl
window arrangement as (partition by id order by ordered_pk);

输出(正确):

| id | value | sum | rank |   relative_average |
|----|-------|-----|------|--------------------|
|  1 |     2 |   2 |    1 |                  2 |
|  1 |     4 |   6 |    2 |                  3 |
|  1 |     3 |   9 |    3 |                  3 |
|  2 |     1 |   1 |    1 |                  1 |
|  2 |     2 |   3 |    2 |                1.5 |
|  2 |     5 |   8 |    3 | 2.6666666666666665 |

实时测试: http://sqlfiddle.com/#!17/f18276/1

您可以按值排序,但会产生不同的结果,不一定是错误的输出,而是由于值的排列不同而不同。然后,您还需要使用 row_number 代替 rank / dense_rank 由于可能重复的值。在这里,我举了一个重复值的示例。

You can order by value, but it will yield different result, not necessarily wrong output, but different because of different arrangement of values. And then you also need to use row_number instead of rank/dense_rank due to possible duplication of values. Here I made an example of duplicate values.

正确的查询:

select 
    id, value, 

    sum(value) over(arrangement),

    row_number() over(arrangement),
    rank() over(arrangement),  
    dense_rank() over(arrangement),    

    sum(value) over(arrangement)::numeric / row_number() over(arrangement) 
    as relative_average
from tbl
window arrangement as (partition by id order by value)

输出:

| id | value | sum | row_number | rank | dense_rank |   relative_average |
|----|-------|-----|------------|------|------------|--------------------|
|  1 |     2 |   2 |          1 |    1 |          1 |                  2 |
|  1 |     3 |   5 |          2 |    2 |          2 |                2.5 |
|  1 |     4 |   9 |          3 |    3 |          3 |                  3 |
|  2 |     1 |   1 |          1 |    1 |          1 |                  1 |
|  2 |     2 |   5 |          2 |    2 |          2 |                2.5 |
|  2 |     2 |   5 |          3 |    2 |          2 | 1.6666666666666667 |
|  2 |     5 |  10 |          4 |    4 |          3 |                2.5 |

实时测试:
http://sqlfiddle.com/#!17/2b5aac/1

这篇关于PostgreSQL按分组获得相对平均的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆