计算运行总额/运行余额 [英] Calculate running total / running balance

查看:48
本文介绍了计算运行总额/运行余额的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子:

create table Transactions(Tid int,amt int)

5行:

insert into Transactions values(1, 100)
insert into Transactions values(2, -50)
insert into Transactions values(3, 100)
insert into Transactions values(4, -100)
insert into Transactions values(5, 200)

所需的输出:

TID  amt  balance
--- ----- -------
1    100   100
2    -50    50
3    100   150
4   -100    50
5    200   250

基本上,第一个记录的余额与 amt 相同,第二个以后的余额将是先前余额+当前 amt 的加法.我正在寻找一种最佳方法.我可以考虑使用函数或相关子查询,但不确定确切如何做.

Basically for first record balance will be same as amt, 2nd onwards balance would be addition of previous balance + current amt. I am looking for an optimal approach. I could think about using function or correlated subquery but not sure exactly how to do it.

推荐答案

对于不使用SQL Server 2012或更高版本的用户,游标可能是最有效的 supported guaranteed > CLR之外的方法.还有其他方法,例如快速更新".这可能会稍快一些,但不能保证将来能使用,当然,随着表的变大,具有双曲线性能配置文件的基于集的方法以及经常需要直接#tempdb I/O或导致产生溢出的递归CTE方法大致相同的影响.

For those not using SQL Server 2012 or above, a cursor is likely the most efficient supported and guaranteed method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.

基于集合的缓慢方法的形式为:

The slow, set-based approach is of the form:

SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt)
FROM dbo.Transactions AS t1
INNER JOIN dbo.Transactions AS t2
  ON t1.TID >= t2.TID
GROUP BY t1.TID, t1.amt
ORDER BY t1.TID;

这是缓慢的原因吗?随着表的变大,每个增量行都需要读取表中的n-1行.这是指数级的,并且会导致失败,超时或只是生气的用户.

The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.

出于类似痛苦的原因,子查询表单也同样痛苦.

The subquery form is similarly painful for similarly painful reasons.

SELECT TID, amt, RunningTotal = amt + COALESCE(
(
  SELECT SUM(amt)
    FROM dbo.Transactions AS i
    WHERE i.TID < o.TID), 0
)
FROM dbo.Transactions AS o
ORDER BY TID;


古怪的更新-风险自负:

古怪的更新"该方法比上面的方法更有效,但是没有记录行为,没有关于顺序的保证,并且该行为可能在今天起作用,但在将来可能会中断.我之所以将其包含在内,是因为它是一种流行的方法,而且效率很高,但这并不意味着我认可它.我什至没有回答这个问题而不是重复回答这个问题的主要原因是因为


递归CTE

第一个依靠TID是连续的,没有间隙:


Recursive CTEs

This first one relies on TID to be contiguous, no gaps:

;WITH x AS
(
  SELECT TID, amt, RunningTotal = amt
    FROM dbo.Transactions
    WHERE TID = 1
  UNION ALL
  SELECT y.TID, y.amt, x.RunningTotal + y.amt
   FROM x 
   INNER JOIN dbo.Transactions AS y
   ON y.TID = x.TID + 1
)
SELECT TID, amt, RunningTotal
  FROM x
  ORDER BY TID
  OPTION (MAXRECURSION 10000);

如果您不能依赖于此,则可以使用此变体,该变体仅使用 ROW_NUMBER()构建一个连续序列:

If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using ROW_NUMBER():

;WITH y AS 
(
  SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID)
    FROM dbo.Transactions
), x AS
(
    SELECT TID, rn, amt, rt = amt
      FROM y
      WHERE rn = 1
    UNION ALL
    SELECT y.TID, y.rn, y.amt, x.rt + y.amt
      FROM x INNER JOIN y
      ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
  FROM x
  ORDER BY x.rn
  OPTION (MAXRECURSION 10000);

根据数据的大小(例如,我们不知道的列),您可能会发现更好的整体性能,方法是先将相关列仅填充在#temp表中,然后再针对该表而不是基表进行处理:

Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:

CREATE TABLE #x
(
  rn  INT PRIMARY KEY,
  TID INT,
  amt INT
);

INSERT INTO #x (rn, TID, amt)
SELECT ROW_NUMBER() OVER (ORDER BY TID),
  TID, amt
FROM dbo.Transactions;

;WITH x AS
(
  SELECT TID, rn, amt, rt = amt
    FROM #x
    WHERE rn = 1
  UNION ALL
  SELECT y.TID, y.rn, y.amt, x.rt + y.amt
    FROM x INNER JOIN #x AS y
    ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
  FROM x
  ORDER BY TID
  OPTION (MAXRECURSION 10000);

DROP TABLE #x;

只有第一种CTE方法才能提供与古怪的更新相媲美的性能,但是它对数据的性质(没有间隙)做出了很大的假设.其他两种方法会退回,在这种情况下,您也可以使用游标(如果无法使用CLR并且尚未使用SQL Server 2012或更高版本).

Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).

每个人都被告知,游标是邪恶的,应该不惜一切代价避免使用游标,但这实际上击败了大多数其他受支持方法的性能,并且比古怪的更新更安全.与游标解决方案相比,我唯一喜欢的是2012和CLR方法(如下):

Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):

CREATE TABLE #x
(
  TID INT PRIMARY KEY, 
  amt INT, 
  rt INT
);

INSERT #x(TID, amt) 
  SELECT TID, amt
  FROM dbo.Transactions
  ORDER BY TID;

DECLARE @rt INT, @tid INT, @amt INT;
SET @rt = 0;

DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
  FOR SELECT TID, amt FROM #x ORDER BY TID;

OPEN c;

FETCH c INTO @tid, @amt;

WHILE @@FETCH_STATUS = 0
BEGIN
  SET @rt = @rt + @amt;
  UPDATE #x SET rt = @rt WHERE TID = @tid;
  FETCH c INTO @tid, @amt;
END

CLOSE c; DEALLOCATE c;

SELECT TID, amt, RunningTotal = rt 
  FROM #x 
  ORDER BY TID;

DROP TABLE #x;


SQL Server 2012或更高版本

SQL Server 2012中引入的新窗口函数使此任务容易得多(并且比所有上述方法的执行效果也更好):


SQL Server 2012 or above

New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):

SELECT TID, amt, 
  RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;

请注意,在较大的数据集上,由于RANGE使用磁盘上的假脱机(默认情况下使用RANGE),因此与以下两个选项之一相比,上述方法的性能要好得多.但是,还必须注意,行为和结果可能会有所不同,因此在基于此差异决定它们之间之前,请确保它们都返回正确的结果.

Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.

SELECT TID, amt, 
  RunningTotal = SUM(amt) OVER (ORDER BY TID)
FROM dbo.Transactions
ORDER BY TID;

SELECT TID, amt, 
  RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;


CLR

为完整起见,我提供了Pavel Pawlowski的CLR方法的链接,到目前为止,这是SQL Server 2012之前版本的首选方法(但显然不是2000).


CLR

For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).

http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/

如果您使用的是SQL Server 2012或更高版本,则选择很明显-使用新的 SUM()OVER()构造(将 ROWS 范围).对于早期版本,您将需要比较方案,数据上替代方法的性能,并且-考虑到与性能无关的因素-确定哪种方法最适合您.很可能是CLR方法.以下是我的建议,按优先顺序排列:

If you are on SQL Server 2012 or above, the choice is obvious - use the new SUM() OVER() construct (with ROWS vs. RANGE). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:

  1. SUM()OVER()... ROWS (如果在2012年或以上)
  2. CLR方法(如果可能)
  3. 可能的话,采用第一种递归CTE方法
  4. 光标
  5. 其他递归CTE方法
  6. 古怪的更新
  7. 加入和/或相关子查询
  1. SUM() OVER() ... ROWS, if on 2012 or above
  2. CLR method, if possible
  3. First recursive CTE method, if possible
  4. Cursor
  5. The other recursive CTE methods
  6. Quirky update
  7. Join and/or correlated subquery


有关这些方法的性能比较的更多信息,请参见 http://dba.stackexchange.com 上的问题:

https://dba.stackexchange.com/questions/19507/running-total-with-计数

我还在此处通过博客发布了有关这些比较的更多详细信息:

I've also blogged more details about these comparisons here:

http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals

有关分组/分区运行总计,请参见以下帖子:

Also for grouped/partitioned running totals, see the following posts:

http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals

分区会导致运行总计查询

使用Group By的多个运行总计

这篇关于计算运行总额/运行余额的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆