Postgres-根据另一列的总和结果计算总和 [英] Postgres - Calculating sums based on the result of sum of another column
问题描述
我正在使用Postresql 9.6,并且试图弄清楚如何创建查询以基于不同列(在同一表中)的离散总和来计算列的总和。例如,我希望每组行的计数总和直到体积> = 100,然后再次开始计算计数总和。
I'm using Postresql 9.6 and am trying to figure out how to create a query to calculate the sum of a column based on the discrete sum of a different column (both in the same table). For example, I want the sum of counts for each set of rows until the volume >= 100 and then to start calculating the sum of counts over again.
示例数据:
id count volume
1 5 12
2 6 14
3 2 11
4 10 9
5 5 14
6 17 19
7 0 8
8 12 4
9 18 6
10 12 14
11 10 10
12 15 7
13 8 12
14 2 17
15 5 30
16 9 24
17 2 16.5
假想的中间结果(只是为了理解我的期望):
Hypothetical intermediate results (just to understand what I expect):
id sum(count) discrete volume
1 5 12
2 11 26
3 13 37
4 23 46
5 28 60
6 45 79
7 45 87
8 57 91
9 75 97
10 87 111
11 10 10 (reset since volume >= 100)
12 25 17
13 33 29
14 35 46
15 40 76
16 49 100
17 2 16.5 (reset since volume >= 100)
预期的最终结果:
sum(count) discrete volume
87 111
49 100
2 16.5 (partial result, which is desired)
到目前为止我已经尝试过: a href = http://sqlfiddle.com/#!17/6182a/11/0 rel = nofollow noreferrer> SQL小提琴
What I have tried so far: SQL Fiddle
我对 SQL小提琴来自另一个StackOverflow答案,这显然不能正常工作(但是,对于我尝试解决的另一个问题,它工作得很好)。我从不同的StackOverflow答案中看到,我可能想要使用递归查询( PostgreSQL文档),但是我可以不能弄清楚如何正确地编写查询,以便它可以正常工作:(
I got idea for the current query shown in SQL Fiddle from another StackOverflow answer and this obviously isn't working correctly (however it is working quite nicely for a different problem I was trying to solve). I saw from a different StackOverflow answer that I probably want to be using a RECURSIVE query (PostgreSQL Documentation) but I can't figure out exactly how to write the query correctly so it works :(
我当然可以编写此代码以使用Java代码工作(并且已经拥有),但是我想要在SQL中执行此操作,因此它希望比读取所有行并计算结果更快。我也很可能会编写一个存储过程来完成这项工作,但是我想避免这种情况,因为我正在使用JPA(在这种情况下使用本机查询),并希望将所有代码都放在同一位置(Java代码库)。此外,我希望能够动态包含/排除列,因此我想用Java代码构造查询(实际表中的列比我的示例多得多)。
I certainly can write this to work in Java code (and already have), but I want to do it in SQL so it is hopefully faster than reading all the rows and calculating the results. I can also most likely write a stored procedure to make this work, but I would rather like to avoid that as I'm using JPA (in this case with a native query) and want to keep all the code in the same place (the Java codebase). Also, I want to be able to dynamically include/exclude columns, so I would like to construct the query in Java code (the actual table has many more columns than my example).
感谢您提供的任何帮助。
Thanks for any help you can give.
编辑:
感谢@klin的评论,并参考了引用的StackOverflow问题,我离这儿越来越近了。这是我所拥有的(产生中间结果):
Thanks to @klin's comment and looking at the referenced StackOverflow issue, I'm getting closer. Here is what I have (generates intermediate results):
WITH RECURSIVE WorkTable(id, count_sum, volume_sum) AS
(
SELECT
id,
count AS count_sum,
volume AS volume_sum
FROM measurements
WHERE id = 1
UNION ALL
SELECT
measurements.id,
CASE WHEN WorkTable.volume_sum >= 100
THEN measurements.count
ELSE WorkTable.count_sum + measurements.count
END AS count_sum,
CASE
WHEN WorkTable.volume_sum >= 100
THEN measurements.volume
ELSE WorkTable.volume_sum + measurements.volume
END AS discrete_sum_volume
FROM measurements
JOIN WorkTable
ON measurements.id = WorkTable.id + 1
)
SELECT *
FROM WorkTable
ORDER BY id
但是,我仍然缺少什么是如何获得最终结果的。如果我使用 WHERE volume_sum> = 100
,则不会得到最终(部分)结果。而且我不能使用 OR id = MAX(id)
,因为Postgres在WHERE子句中不允许这样做。
However, what I'm still missing is how to get the final result. If I use WHERE volume_sum >= 100
I don't get the final (partial) result. And I can't use OR id = MAX(id)
because Postgres doesn't allow that in a WHERE clause.
编辑:顺便说一句,在@klin和@JorgeCampos花费了所有的时间和精力来查找并回答我的问题之后,我发现此查询不适用于具有数百万行的表。我创建了一个存储过程。我不想去那里,但似乎没有其他表现出色的选择。对于大型表,该存储过程的性能要比RECURSIVE查询高多个数量级。
BTW after all the work and time that @klin and @JorgeCampos spent looking at and answering my question, I found that this query doesn't perform for a table with millions of rows. I have created a stored procedure instead; I didn't want to have to go there but there seems to be no other well performing alternative. The stored procedure out-performs the RECURSIVE query by many orders of magnitude for a large table.
推荐答案
我使用了 row_count()
,因为需要连续的行号。您不应该依赖主键,因为它通常可能包含空格。
I have used row_count()
as consecutive row numbers are needed. You should not rather rely on the primary key, because it usually may contain gaps.
with recursive cte as (
select *, row_number() over (order by id)
from measurements
),
work_table as (
select
id,
count as count_sum,
volume as volume_sum,
row_number
from cte
where row_number = 1
union all
select
c.id,
case when w.volume_sum >= 100
then c.count
else w.count_sum + c.count
end as count_sum,
case
when w.volume_sum >= 100
then c.volume
else w.volume_sum + c.volume
end as discrete_sum_volume,
c.row_number
from cte c
join work_table w
on c.row_number = w.row_number + 1
)
select count_sum, volume_sum
from work_table
where volume_sum >= 100
or id = (select max(id) from work_table)
order by id
结果:
count_sum | volume_sum
-----------+------------
87 | 111
49 | 100
2 | 16.5
(3 rows)
这篇关于Postgres-根据另一列的总和结果计算总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!