使用窗口函数求和所有分区 [英] Sum across partitions with window functions

查看:91
本文介绍了使用窗口函数求和所有分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题...

    Time |  A   |  B |  C  --  Sum should be
      1     a1     b1   c1     a1 + b1 + c1
      2     a2     b2   x      a2 + b1 + c1 
      3     a3     x    x      a3 + b2 + c1
      4     x      b3   c2     a3 + b3 + c2

本质上,总和必须是最新值 及时排三列.每个数据列都不 必须具有当前时间的值.

Essentially, the sum needs to be across the most recent value in time for each of the three rows. Each data column doesn't necessarily have a value for the current time.

我尝试了几种使用窗口函数的方法, 一直没有成功.我写了一个存储过程 可以满足我的需求,但是速度很慢.

I have tried several approaches using window functions and have been unsuccessful. I have written a stored procedure that does what I need, but it is SLOW.

CREATE OR REPLACE FUNCTION timeseries.combine_series(id int[], startTime timestamp, endTime timestamp) 
RETURNS setof RECORD AS $$
DECLARE
    retval double precision = 0;
    row_data timeseries.total_active_energy%ROWTYPE;
    maxCount integer = 0;
    sz integer = 0;
lastVal double precision[];
v_rec RECORD;
BEGIN   
    SELECT INTO sz array_length($1,1);

    FOR row_data IN SELECT * FROM timeseries.total_active_energy  WHERE time >= startTime AND time < endTime AND device_id = ANY($1) ORDER BY time
       LOOP
    retval = row_data.active_power;
    for i IN 1..sz LOOP
        IF $1[i]=row_data.device_id THEN
            lastVal[i] = row_data.active_power;
        ELSE
            retval = retVal + COALESCE(lastVal[i],0);
        END IF;
    END LOOP;

    SELECT row_data.time, retval into v_rec;

    return next v_rec;
     END LOOP;

      return ;
  END;
$$ LANGUAGE plpgsql;

致电:

select * from timeseries.combine_series('{552,553,554}'::int[], '2013-05-01'::timestamp, '2013-05-02'::timestamp) 
    AS (t timestamp with time zone, val double precision);

推荐答案

SELECT ts, a, b, c
       , COALESCE(max(a) OVER (PARTITION BY grp_a), 0)
       + COALESCE(max(b) OVER (PARTITION BY grp_b), 0)
       + COALESCE(max(c) OVER (PARTITION BY grp_c), 0) AS special_sum
FROM  (
   SELECT *
         ,count(a) OVER w AS grp_a
         ,count(b) OVER w AS grp_b
         ,count(c) OVER w AS grp_c
   FROM   t
   WINDOW w AS (ORDER BY ts)
   ) sub
ORDER  BY ts;

首先,使用汇总窗口函数count()将实际值和紧随其后的NULL值放在一个组中:它不会随NULL值而增加.

First, put actual values and following NULL values in a group with the aggregate window function count(): it does not increment with NULL values.

然后从每个组中提取max(),以找到您想要的内容.此时,您最好也使用min()sum(),因为每个组只有一个非空值.

Then take max() from every group, arriving at what you are looking for. At this point you could just as well use min() or sum(), since there is only one non-null value per group.

COALESCE()会捕获NULL值.

请注意我是如何选择ts作为列名的,因为我没有使用像time这样的基本类型名称作为标识符.

Note how I picked ts as column name, since I don't use base type names like time as identifiers.

这也是每个人都应该首先提供示例数据的方式!

That's also how you everyone should provide sample data in the first place!

CREATE TEMP TABLE t (ts int, a int, b int, c int);

INSERT INTO t VALUES
  (1, 11,   21,   NULL)
 ,(2, 12,   22,   NULL)
 ,(3, 13,   NULL, NULL)
 ,(4, NULL, 23,   32);

这篇关于使用窗口函数求和所有分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆