计算SQL Server中的运行总计 [英] Calculate a Running Total in SQL Server
问题描述
想象一下下表(称为 TestTable
):
Imagine the following table (called TestTable
):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
我想要一个按日期顺序返回运行总计的查询,例如:
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
我知道有我对这种使用aggregating-set-statement技巧的方法特别感兴趣:
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE @RunningTotal int
SET @RunningTotal = 0
UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl
...这是非常有效的,但是我听说周围存在一些问题,因为您不一定能保证 UPDATE
语句将以正确的顺序处理行.也许我们可以获得有关该问题的明确答案.
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE
statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
但是也许人们可以提出其他建议?
But maybe there are other ways that people can suggest?
edit:现在使用 SqlFiddle ,其中包含设置和更新技巧"示例以上
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
推荐答案
更新,如果您正在运行SQL Server 2012,请参见:
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle(和ANSI-SQL)允许您执行以下操作:
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server无法为您解决此问题.我的直觉告诉我,这是极少数情况下光标最快的情况之一,尽管我必须对大结果进行一些基准测试.
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
更新技巧很方便,但我觉得它相当脆弱.看来,如果您要更新一个完整的表,它将按照主键的顺序进行.因此,如果将日期设置为升序的主键,则可能
是安全的.但是,您所依赖的是未公开的SQL Server实现细节(同样,如果查询最终由两个proc执行,我想知道会发生什么,请参阅:MAXDOP):
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably
be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
完整的工作示例:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
您要求提供基准,这是最低要求.
You asked for a benchmark this is the lowdown.
最快的SAFE方式是游标,它比交叉联接的相关子查询快一个数量级.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
绝对最快的方法是UPDATE技巧.我唯一关心的是,我不确定在所有情况下更新都会以线性方式进行.查询中没有明确说明的内容.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
最底线,对于生产代码,我将使用光标.
Bottom line, for production code I would go with the cursor.
测试数据:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
insert #t (ord, total) values (@i, rand() * 100)
set @i = @i +1
end
commit
测试1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
测试2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
测试3:
DECLARE @TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE @running_total int,
@ord int,
@total int
SET @running_total = 0
FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @running_total = @running_total + @total
INSERT @TotalTable VALUES(@ord, @total, @running_total)
FETCH NEXT FROM forward_cursor INTO @ord, @total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM @TotalTable
-- CPU 359, Reads 30392, Duration 496
测试4:
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
这篇关于计算SQL Server中的运行总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!