如何做滚动总和,每行需要包括前几行的总和 [英] How to do a rolling sum, each row need to include the sum of previous rows
问题描述
我有桌子[访问].对于order_number为null的情况,例如,对于用户[2875636],我将获得按user_id分组且具有visit_duration_seconds之和的行,我将得到:61 + 151 + 33 + 13.每行应包括其前面各行的总和.
请在下面的预期结果中引用RESULT列
I have table [visit]. I need to get the rows grouped by user_id with sum of visit_duration_seconds for when order_number is null, for example, for user [2875636] i'll get: 61+151+33+13. Each row should include the sum of the rows before it.
Plz also refer RESULT column in expected result below
user_id starttime visit_duration_seconds order_number
2875636 2013-01-16 18:03:50 61
2875636 2013-01-16 18:08:18 151
2875636 2013-01-16 18:15:43 33
2875636 2013-01-16 18:16:37 13
2875636 2013-01-16 18:18:01 2011 10177888
2875636 2013-01-16 18:24:35 1172 10177884
2875636 2013-01-16 18:32:03 4731
2875636 2013-01-16 18:33:27 407
2875636 2013-01-16 18:37:29 74
2875636 2013-01-16 18:48:55 80
2875636 2013-01-16 19:05:00 1955
2875636 2013-01-16 19:14:12 326
2875636 2013-01-16 19:23:39 972
2875636 2013-01-16 19:33:05 5440
2875636 2013-01-16 19:35:48 43
2875636 2013-01-16 19:41:10 66
2875636 2013-01-16 19:42:03 100
2875636 2013-01-16 19:42:12 2414 10177940
2875636 2013-01-16 19:49:05 432 10177925
2875636 2013-01-16 19:50:19 183
2875636 2013-01-16 19:52:46 2061
2875636 2013-01-16 19:52:53 400
2875636 2013-01-16 20:00:47 338
2875636 2013-01-16 20:08:58 216
2875636 2013-01-16 20:14:21 58
2875636 2013-01-16 20:14:26 196
2875636 2013-01-16 20:19:14 2189
2875636 2013-01-16 20:21:29 424
2875636 2013-01-16 20:24:42 999
2875636 2013-01-16 21:01:39 1810
2875636 2013-01-16 21:02:54 525
2875636 2013-01-16 21:10:06 27
2875636 2013-01-16 21:12:08 282
2875636 2013-01-16 21:51:02 6
2875636 2013-01-16 22:18:34 173
2875636 2013-01-16 23:02:58 318
2875636 2013-01-16 23:45:37 207
3018868 2013-01-16 16:01:45 18
3018868 2013-01-16 16:16:45 39
3018868 2013-01-16 16:22:55 656
3018868 2013-01-16 16:25:54 1852
3018868 2013-01-16 16:29:23 688
3018868 2013-01-16 16:47:26 2258 10177846
3018868 2013-01-16 16:57:41 572
3018868 2013-01-16 17:06:47 1431
3018868 2013-01-16 17:18:32 29
3018868 2013-01-16 17:21:57 45
3018868 2013-01-16 17:29:23 16
3018868 2013-01-16 17:36:47 490
预期结果
user_id starttime visit_duration_seconds order_number RESULT
2875636 2013-01-16 18:03:50 61 61
2875636 2013-01-16 18:08:18 151 212
2875636 2013-01-16 18:15:43 33 245
2875636 2013-01-16 18:16:37 13 258
2875636 2013-01-16 18:18:01 2011 10177888 0
2875636 2013-01-16 18:24:35 1172 10177884 0
2875636 2013-01-16 18:32:03 4731 4731
2875636 2013-01-16 18:33:27 407 5138
2875636 2013-01-16 18:37:29 74 5212
2875636 2013-01-16 18:48:55 80 ...
2875636 2013-01-16 19:05:00 1955 ...
2875636 2013-01-16 19:14:12 326 ...
2875636 2013-01-16 19:23:39 972
2875636 2013-01-16 19:33:05 5440
2875636 2013-01-16 19:35:48 43
2875636 2013-01-16 19:41:10 66
2875636 2013-01-16 19:42:03 100
2875636 2013-01-16 19:42:12 2414 10177940
2875636 2013-01-16 19:49:05 432 10177925
2875636 2013-01-16 19:50:19 183
2875636 2013-01-16 19:52:46 2061
2875636 2013-01-16 19:52:53 400
2875636 2013-01-16 20:00:47 338
2875636 2013-01-16 20:08:58 216
2875636 2013-01-16 20:14:21 58
2875636 2013-01-16 20:14:26 196
2875636 2013-01-16 20:19:14 2189
2875636 2013-01-16 20:21:29 424
2875636 2013-01-16 20:24:42 999
2875636 2013-01-16 21:01:39 1810
2875636 2013-01-16 21:02:54 525
2875636 2013-01-16 21:10:06 27
2875636 2013-01-16 21:12:08 282
2875636 2013-01-16 21:51:02 6
2875636 2013-01-16 22:18:34 173
2875636 2013-01-16 23:02:58 318
2875636 2013-01-16 23:45:37 207
3018868 2013-01-16 16:01:45 18
3018868 2013-01-16 16:16:45 39
3018868 2013-01-16 16:22:55 656
3018868 2013-01-16 16:25:54 1852
3018868 2013-01-16 16:29:23 688
3018868 2013-01-16 16:47:26 2258 10177846
3018868 2013-01-16 16:57:41 572
3018868 2013-01-16 17:06:47 1431
3018868 2013-01-16 17:18:32 29
3018868 2013-01-16 17:21:57 45
3018868 2013-01-16 17:29:23 16
3018868 2013-01-16 17:36:47 490
推荐答案
您可以使用MySQL用户变量来模拟分析函数. (还有其他一些方法,例如使用半联接或使用相关子查询.如果您觉得它们更合适,我也可以提供解决方案.)
You can make use of MySQL user variables to emulate analytic functions. (There are some other approaches as well, like using a semi-join or using a correlated subquery. I can provide solutions for those as well, if you feel they may be more appropriate.)
要模拟运行总计"分析功能,请尝试如下操作:
For emulating a "running total" analytic function, try something like this:
SELECT t.user_id
, t.starttime
, t.order_number
, IF(t.order_number IS NOT NULL,
@tot_dur := 0,
@tot_dur := @tot_dur + t.visit_duration_seconds) AS tot_dur
FROM visit t
JOIN (SELECT @tot_dur := 0) d
ORDER BY t.user_id, t.start_time
此处的技巧"是使用IF函数测试order_number
是否为空.如果为空,则将持续时间值添加到变量中,否则,将变量设置为零.
The "trick" here is to use an IF function to test whether or not order_number
is null. When it's null, we add the duration value to the variable, otherwise, we set the variable to zero.
我们使用内联视图(别名为d
,以确保将@tot_dur变量初始化为零.
We use an inline view (aliased as d
, to ensure that the @tot_dur variable is initialized to zero.
注意:请注意使用这样的MySQL用户变量.在上面的SELECT语句中,SELECT列表中变量的分配发生在ORDER BY之后,因此我们可以获得确定性行为.
NOTE: Take care with using MySQL user variables like this. In the SELECT statement as above, the assignment of the variables in the SELECT list happens after the ORDER BY, so we can get deterministic behavior.
该查询不处理user_id中的中断".为此,我们需要上一行的user_id值.我们可以将其保留在另一个用户变量中.操作的顺序是确定的,在覆盖上一行的user_id之前,我们需要小心进行累加.
That query does not handle "breaks" in user_id. To get that, we are going to need the value of user_id from the previous row. We can preserve that in another user variable. The order of the operations is deterministic, and we need to take care to do the accumulation BEFORE we overwrite the user_id from the previous row.
我们要么需要对列进行重新排序,以便使user_id出现在tot_dur之后(或者包括user_id列的第二个副本)
We either need to reorder the columns so that user_id appears after tot_dur (or include a second copy of the user_id column)
SELECT t.user_id
, t.starttime
, t.order_number
, IF(t.order_number IS NULL,
@tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
@tot_dur := 0
) AS tot_dur
, @prev_user_id := t.user_id AS prev_user_id
FROM visit t
JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
ORDER BY t.user_id, t.start_time
user_id
和prev_user_id
列中返回的值相同.可以删除该额外"列,也可以通过将查询(作为内联视图)包装在另一个查询中来对这些列进行重新排序,尽管这样做会降低性能:
The values returned in the user_id
and prev_user_id
columns is identical. That "extra" column could be removed, or the columns could be reordered by wrapping the query (as an inline view) in another query, although this comes at a performance cost:
SELECT v.user_id
, v.starttime
, v.order_number
, v.tot_dur
FROM (SELECT t.starttime
, t.order_number
, IF(t.order_number IS NULL,
@tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
@tot_dur := 0
) AS tot_dur
, @prev_user_id := t.user_id AS user_id
FROM visit t
JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
ORDER BY t.user_id, t.start_time
) v
该查询表明MySQL可以返回指定的结果集.但是为了获得最佳性能,我们只想在内联视图中运行查询(别名为v
),并在客户端处理行时对列的重新排序(将user_id列放在第一位)被检索.
That query demonstrates that it is possible for MySQL to return the specified resultset. But for optimum performance, we'd want to run just the query in the inline view (aliased as v
), and handle the re-ordering of the columns (putting the user_id column first) on the client side, when the rows are retrieved.
其他两种常见方法是使用半联接和相关子查询,尽管这些方法在处理大型集合时可能会占用更多资源.
The other two common approaches are using a semi-join, and using a correlated subquery, although these approaches can be more resource intensive when processing large sets.
这篇关于如何做滚动总和,每行需要包括前几行的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!