递归SQL语句(PostgreSQL 9.1.4) [英] Recursive SQL statement (PostgreSQL 9.1.4)

查看:99
本文介绍了递归SQL语句(PostgreSQL 9.1.4)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

PostgreSQL 9.1

PostgreSQL 9.1

业务状况

每个月都有一个分配给特定流程的新一批帐户。每批都可以按月份,帐户数和帐户总余额来描述。该过程的目标是从客户那里收回一些余额。
然后每月分别跟踪每个批次(自从批次转移到流程以来,每个月回收的金额)。

Every month, there is a new batch of accounts given to a specific process. Every batch can be described by month, number of accounts and total balance of accounts. The goal of the process is to recover some of the balance back from customers. Each batch is than tracked separately on a monthly basis (amount recovered on each month since batch was transferred to the process).

目标

我的目标是预测将来将回收多少。

My goal is to predict what amount will be recovered in the future.

数据定义

create table vintage_data (
    granularity date,       /* Month when account entered process*/
    distance_in_months integer, /* Distance in months from date when accounts entered process*/
    entry_accounts integer,     /* Number of accounts that entered process in a given month*/
    entry_amount numeric,       /* Total amount for account that entered process in a given month*/
    recovery_amount numeric     /* Amount recovered in Nth month on accounts that entered process in a given month */
);

样本数据

insert into vintage_data values('2012-01-31',1,200,100000,1000);
insert into vintage_data values('2012-01-31',2,200,100000,2000);
insert into vintage_data values('2012-01-31',3,200,100000,3000);
insert into vintage_data values('2012-01-31',4,200,100000,3500);
insert into vintage_data values('2012-01-31',5,200,100000,3400);
insert into vintage_data values('2012-01-31',6,200,100000,3300);
insert into vintage_data values('2012-02-28',1,250,150000,1200);
insert into vintage_data values('2012-02-28',2,250,150000,1600);
insert into vintage_data values('2012-02-28',3,250,150000,1800);
insert into vintage_data values('2012-02-28',4,250,150000,1200);
insert into vintage_data values('2012-02-28',5,250,150000,1600);
insert into vintage_data values('2012-03-31',1,200,90000,1300);
insert into vintage_data values('2012-03-31',2,200,90000,1200);
insert into vintage_data values('2012-03-31',3,200,90000,1400);
insert into vintage_data values('2012-03-31',4,200,90000,1000);
insert into vintage_data values('2012-04-30',1,300,180000,1600);
insert into vintage_data values('2012-04-30',2,300,180000,1500);
insert into vintage_data values('2012-04-30',3,300,180000,4000);
insert into vintage_data values('2012-05-31',1,400,225000,2200);
insert into vintage_data values('2012-05-31',2,400,225000,6000);
insert into vintage_data values('2012-06-30',1,100,60000,1000);

计算过程

您可以将数据想象成一个三角矩阵(X值将被预测):

You can imagine the data as a triangular matrix (X values are to be forecasted):

distance_in_months                       1      2     3       4      5      6
granularity entry_accounts  entry_amount
2012-01-31  200             100000       1000   2000   3000   3500   3400   3300
2012-02-28  250             150000       1200   1600   1800   1200   1600   (X-1)
2012-03-31  200              90000       1300   1200   1400   1000   (X0)   (X4)
2012-04-30  300             180000       1600   1500   4000   (X1)   (X5)   (X8)
2012-05-31  400             225000       2200   6000   (X2)   (X6)   (X9)   (X11)
2012-06-30  100              60000       1000   (X3)   (X7)   (X10)  (X12   (X13)

算法

我的目标是预测所有遗漏的点(未来)。过程,这是对点X1的计算

The goal I have is to forecast all the missing points (future). To illustrate the process, this is the calculation for the point X1

1)使用距离最大为4来获取前三个月的行总计:

1) Get row totals for previous three months using distance up to 4:

2012-01-31  1000+2000+3000+3500=9500 (d4m3)
2012-02-28  1200+1600+1800+1200=5800 (d4m2)
2012-03-31  1300+1200+1400+1000=4900 (d4m1)

2)使用距离最大为3的距离获取前三个月的行总计:

2) Get row totals for previous three months using distance up to 3:

2012-01-31  1000+2000+3000=6000 (d3m3)
2012-02-28  1200+1600+1800=4600 (d3m2)
2012-03-31  1300+1200+1400=3800 (d3m1)

3)计算距离3和距离4的加权平均运行速度(由entry_amount加权):

3) Calculate weighted average running rate for distance 3 and distance 4 (weighted by entry_amount):

(d4m3+d4m2+d4m1)/(100000+150000+90000) = (9500+5800+4900)/(100000+150000+90000) = 20200/340000 = 0.0594
(d3m3+d3m2+d3m1)/(100000+150000+90000) = (6000+4600+3800)/(100000+150000+90000) = 14400/340000 = 0.0424

4)计算变更距离3和距离4

4) Calculate the change between distance 3 and distance 4

((d4m3+d4m2+d4m1)/(100000+150000+90000))/((d3m3+d3m2+d3m1)/(100000+150000+90000)) =
= (20200/340000)/(14400/340000) =
= 0.0594/0.0424 = 1.403 (PredictionRateForX1)

5)使用不超过3的距离计算预测月份的行总数。

5) Calculate row totals for predicted month using distance up to 3:

2012-04-30  1600+1500+4000=7100

6)使用entry_amount预测月份的费率

6) Calculate rate using entry_amount for predicted month

7100/180000 = 0.0394

7)计算X1的预测速率

7) Calculate rate predicted for X1

0.0394 * PredictionRateForX1 = 0.05534

8)计算X1的金额

(0.05534-0.0394)*180000 = 2869.2

问题

问题是如何使用SQL语句计算矩阵的其余部分(从x-1到x13)。显然,这将需要某种递归算法。

The problem is how to calculate the rest of the matrix (from x-1 to x13) using SQL statement. It is obvious that this will require some sort of recursive algorithm.

推荐答案

这是一项艰巨的任务,将其拆分以使其更复杂。可管理的。我会把它放在带有 RETURN TABLE 的plpgsql函数中:

It's a big task, split it up to make it more manageable. I would put that in a plpgsql function with RETURN TABLE:


  1. 使用交叉表查询
    为计算过程矩阵创建临时表。您需要 tablefunc 模块。运行(每个数据库一次):

  1. Create a temporary table for your "Calculation Process" matrix using a crosstab query You need the tablefunc module installed for that. Run (once per database):

CREATE EXTENSION tablefunc;


  • 按字段更新临时表。

  • Update the temp table field by field.

    以下演示功能完整,并已在PostgreSQL 9.1.4中进行了测试。在问题中提供的表定义的基础上:

    The following demo is fully functional and tested with PostgreSQL 9.1.4. Building on the table definition provided in the question:

    -- DROP FUNCTION f_forcast();
    
    CREATE OR REPLACE FUNCTION f_forcast()
      RETURNS TABLE (
      granularity date
     ,entry_accounts numeric
     ,entry_amount numeric
     ,d1 numeric
     ,d2 numeric
     ,d3 numeric
     ,d4 numeric
     ,d5 numeric
     ,d6 numeric) AS
    $BODY$
    BEGIN
    
    --== Create temp table with result of crosstab() ==--
    
    CREATE TEMP TABLE matrix ON COMMIT DROP AS
    SELECT *
    FROM   crosstab (
            'SELECT granularity, entry_accounts, entry_amount
                   ,distance_in_months, recovery_amount
             FROM   vintage_data
             ORDER  BY 1, 2',
    
            'SELECT DISTINCT distance_in_months
             FROM   vintage_data
             ORDER  BY 1')
    AS tbl (
      granularity date
     ,entry_accounts numeric
     ,entry_amount numeric
     ,d1 numeric
     ,d2 numeric
     ,d3 numeric
     ,d4 numeric
     ,d5 numeric
     ,d6 numeric
     );
    
    ANALYZE matrix; -- update statistics to help calculations
    
    
    --== Calculations ==--
    
    -- I implemented the first calculation for X1 and leave the rest to you.
    -- Can probably be generalized in a loop or even a single statement.
    
    UPDATE matrix m
    SET    d4 = (
        SELECT (sum(x.d1) + sum(x.d2) + sum(x.d3) + sum(x.d4))
                /(sum(x.d1) + sum(x.d2) + sum(x.d3)) - 1
                -- removed redundant sum(entry_amount) from equation
        FROM  (
            SELECT *
            FROM   matrix a
            WHERE  a.granularity < m.granularity
            ORDER  BY a.granularity DESC
            LIMIT  3
            ) x
        ) * (m.d1 + m.d2 + m.d3)
    WHERE m.granularity = '2012-04-30';
    
    --- Next update X2 ..
    
    
    --== Return results ==--
    
    RETURN QUERY
    TABLE  matrix
    ORDER  BY 1;
    
    END;
    $BODY$ LANGUAGE plpgsql;
    

    致电:

    SELECT * FROM f_forcast();
    

    我已经简化了很多,删除了计算中的一些多余步骤。

    该解决方案采用了多种先进技术。您需要了解使用PostgreSQL的方式。

    I have simplified quite a bit, removing some redundant steps in the calculation.
    The solution employs a variety of advanced techniques. You need to know your way around PostgreSQL to work with this.

    这篇关于递归SQL语句(PostgreSQL 9.1.4)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆