以行比较为条件的 Postgres 聚合总和 [英] Postgres aggregate sum conditional on row comparison

查看:30
本文介绍了以行比较为条件的 Postgres 聚合总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有看起来像这样的数据

So, I have data that looks something like this

User_Object | filesize | created_date | deleted_date
row 1       | 40        | May 10       | Aug 20
row 2       | 10        | June 3       | Null
row 3       | 20        | Nov 8        | Null

我正在构建统计数据,以根据基于时间的数据点将用户数据使用情况记录到图表中.但是,我很难开发一个查询来获取它之前所有查询的每一行的总和,但仅适用于创建该行时存在的行.在采取这一步合并已删除的值之前,我有一个简单的朴素查询,如下所示:

I'm building statistics to record user data usage to graph based on time based datapoints. However, I'm having difficulty developing a query to take the sum for each row of all queries before it, but only for the rows that existed at the time of that row's creation. Before taking this step to incorporate deleted values, I had a simple naive query like this:

SELECT User_Object.id, User_Object.created, SUM(filesize) OVER (ORDER BY User_Object.created) AS sum_data_used
    FROM User_Object
    JOIN user ON User_Object.user_id = user.id
    WHERE user.id = $1

但是,我想以某种方式更改它,以便窗口函数有条件仅获取在此用户对象之前创建的任何行的总和,当该行在此用户对象之前也没有删除日期时.

However, I want to alter this somehow so that there's a conditional for the the window function to only get the sum of any row created before this User Object when that row doesn't have a deleted date also before this User Object.

这个不正确的语法说明了我想要做什么:

This incorrect syntax illustrates what I want to do:

SELECT User_Object.id, User_Object.created, 
        SUM(CASE WHEN NOT window_function_row.deleted
            OR window_function_row.deleted > User_Object.created
            THEN filesize ELSE 0)
        OVER (ORDER BY User_Object.created) AS sum_data_used
    FROM User_Object
    JOIN user ON User_Object.user_id = user.id
    WHERE user.id = $1

当这个函数在我拥有的数据上运行时,它应该输出类似

When this function runs on the data that I have, it should output something like

id      | created | sum_data_used|
1       | May 10  | 40
2       | June 3  | 50
3       | Nov 8   | 30

推荐答案

以下内容可能对您有用:

Something along these lines may work for you:

SELECT a.user_id
      ,MIN(a.created_date) AS created_date
      ,SUM(b.filesize) AS sum_data_used
  FROM user_object a
  JOIN user_object b ON (b.user_id <= a.user_id
                    AND COALESCE(b.deleted_date, a.created_date) >= a.created_date)
  GROUP BY a.user_id
  ORDER BY a.user_id

对于每一行,自连接,匹配 id 小于或等于,并与日期重叠.这将是昂贵的,因为每一行都需要查看整个表来计算文件大小结果.这里没有发生累积操作.但我不确定有没有办法做到这一点.

For each row, self-join, match id lower or equal, and with date overlap. It will be expensive because each row needs to look through the entire table to calculate the files size result. There is no cumulative operation taking place here. But I'm not sure there is a way that.

示例表定义:

create table user_object(user_id int, filesize int, created_date date, deleted_date date);

数据:

1;40;2016-05-10;2016-08-29
2;10;2016-06-03;<NULL>
3;20;2016-11-08;<NULL>

结果:

1;2016-05-10;40
2;2016-06-03;50
3;2016-11-08;30

这篇关于以行比较为条件的 Postgres 聚合总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆