如何做滚动总和,每行需要包括前几行的总和 [英] How to do a rolling sum, each row need to include the sum of previous rows

查看:82
本文介绍了如何做滚动总和,每行需要包括前几行的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有桌子[访问].对于order_number为null的情况,例如,对于用户[2875636],我将获得按user_id分组且具有visit_duration_seconds之和的行,我将得到:61 + 151 + 33 + 13.每行应包括其前面各行的总和.
请在下面的预期结果中引用RESULT列

I have table [visit]. I need to get the rows grouped by user_id with sum of visit_duration_seconds for when order_number is null, for example, for user [2875636] i'll get: 61+151+33+13. Each row should include the sum of the rows before it.
Plz also refer RESULT column in expected result below

user_id   starttime           visit_duration_seconds  order_number
2875636   2013-01-16 18:03:50 61  
2875636   2013-01-16 18:08:18 151 
2875636   2013-01-16 18:15:43 33  
2875636   2013-01-16 18:16:37 13  
2875636   2013-01-16 18:18:01 2011                     10177888
2875636   2013-01-16 18:24:35 1172                     10177884
2875636   2013-01-16 18:32:03 4731    
2875636   2013-01-16 18:33:27 407 
2875636   2013-01-16 18:37:29 74  
2875636   2013-01-16 18:48:55 80  
2875636   2013-01-16 19:05:00 1955    
2875636   2013-01-16 19:14:12 326 
2875636   2013-01-16 19:23:39 972 
2875636   2013-01-16 19:33:05 5440    
2875636   2013-01-16 19:35:48 43  
2875636   2013-01-16 19:41:10 66  
2875636   2013-01-16 19:42:03 100 
2875636   2013-01-16 19:42:12 2414                     10177940
2875636   2013-01-16 19:49:05 432                  10177925
2875636   2013-01-16 19:50:19 183 
2875636   2013-01-16 19:52:46 2061    
2875636   2013-01-16 19:52:53 400 
2875636   2013-01-16 20:00:47 338 
2875636   2013-01-16 20:08:58 216 
2875636   2013-01-16 20:14:21 58  
2875636   2013-01-16 20:14:26 196 
2875636   2013-01-16 20:19:14 2189    
2875636   2013-01-16 20:21:29 424 
2875636   2013-01-16 20:24:42 999 
2875636   2013-01-16 21:01:39 1810    
2875636   2013-01-16 21:02:54 525 
2875636   2013-01-16 21:10:06 27  
2875636   2013-01-16 21:12:08 282 
2875636   2013-01-16 21:51:02 6   
2875636   2013-01-16 22:18:34 173 
2875636   2013-01-16 23:02:58 318 
2875636   2013-01-16 23:45:37 207 
3018868   2013-01-16 16:01:45 18  
3018868   2013-01-16 16:16:45 39  
3018868   2013-01-16 16:22:55 656 
3018868   2013-01-16 16:25:54 1852    
3018868   2013-01-16 16:29:23 688 
3018868   2013-01-16 16:47:26 2258                       10177846
3018868   2013-01-16 16:57:41 572 
3018868   2013-01-16 17:06:47 1431    
3018868   2013-01-16 17:18:32 29  
3018868   2013-01-16 17:21:57 45  
3018868   2013-01-16 17:29:23 16  
3018868   2013-01-16 17:36:47 490

预期结果

user_id starttime           visit_duration_seconds  order_number        RESULT
2875636 2013-01-16 18:03:50 61                                      61
2875636 2013-01-16 18:08:18 151                                     212
2875636 2013-01-16 18:15:43 33                                      245
2875636 2013-01-16 18:16:37 13                                      258
2875636 2013-01-16 18:18:01 2011                     10177888           0
2875636 2013-01-16 18:24:35 1172                     10177884           0
2875636 2013-01-16 18:32:03 4731                                        4731
2875636 2013-01-16 18:33:27 407                                     5138
2875636 2013-01-16 18:37:29 74                                      5212
2875636 2013-01-16 18:48:55 80                                      ...
2875636 2013-01-16 19:05:00 1955                                        ...
2875636 2013-01-16 19:14:12 326                                     ...
2875636 2013-01-16 19:23:39 972 
2875636 2013-01-16 19:33:05 5440    
2875636 2013-01-16 19:35:48 43  
2875636 2013-01-16 19:41:10 66  
2875636 2013-01-16 19:42:03 100 
2875636 2013-01-16 19:42:12 2414                     10177940
2875636 2013-01-16 19:49:05 432                  10177925
2875636 2013-01-16 19:50:19 183 
2875636 2013-01-16 19:52:46 2061    
2875636 2013-01-16 19:52:53 400 
2875636 2013-01-16 20:00:47 338 
2875636 2013-01-16 20:08:58 216 
2875636 2013-01-16 20:14:21 58  
2875636 2013-01-16 20:14:26 196 
2875636 2013-01-16 20:19:14 2189    
2875636 2013-01-16 20:21:29 424 
2875636 2013-01-16 20:24:42 999 
2875636 2013-01-16 21:01:39 1810    
2875636 2013-01-16 21:02:54 525 
2875636 2013-01-16 21:10:06 27  
2875636 2013-01-16 21:12:08 282 
2875636 2013-01-16 21:51:02 6   
2875636 2013-01-16 22:18:34 173 
2875636 2013-01-16 23:02:58 318 
2875636 2013-01-16 23:45:37 207 
3018868 2013-01-16 16:01:45 18  
3018868 2013-01-16 16:16:45 39  
3018868 2013-01-16 16:22:55 656 
3018868 2013-01-16 16:25:54 1852    
3018868 2013-01-16 16:29:23 688 
3018868 2013-01-16 16:47:26 2258                       10177846
3018868 2013-01-16 16:57:41 572 
3018868 2013-01-16 17:06:47 1431    
3018868 2013-01-16 17:18:32 29  
3018868 2013-01-16 17:21:57 45  
3018868 2013-01-16 17:29:23 16  
3018868 2013-01-16 17:36:47 490 

推荐答案

您可以使用MySQL用户变量来模拟分析函数. (还有其他一些方法,例如使用半联接或使用相关子查询.如果您觉得它们更合适,我也可以提供解决方案.)

You can make use of MySQL user variables to emulate analytic functions. (There are some other approaches as well, like using a semi-join or using a correlated subquery. I can provide solutions for those as well, if you feel they may be more appropriate.)

要模拟运行总计"分析功能,请尝试如下操作:

For emulating a "running total" analytic function, try something like this:

SELECT t.user_id
     , t.starttime
     , t.order_number
     , IF(t.order_number IS NOT NULL,
         @tot_dur := 0,
         @tot_dur := @tot_dur + t.visit_duration_seconds) AS tot_dur
  FROM visit t
  JOIN (SELECT @tot_dur := 0) d
 ORDER BY t.user_id, t.start_time

此处的技巧"是使用IF函数测试order_number是否为空.如果为空,则将持续时间值添加到变量中,否则,将变量设置为零.

The "trick" here is to use an IF function to test whether or not order_number is null. When it's null, we add the duration value to the variable, otherwise, we set the variable to zero.

我们使用内联视图(别名为d,以确保将@tot_dur变量初始化为零.

We use an inline view (aliased as d, to ensure that the @tot_dur variable is initialized to zero.

注意:请注意使用这样的MySQL用户变量.在上面的SELECT语句中,SELECT列表中变量的分配发生在ORDER BY之后,因此我们可以获得确定性行为.

NOTE: Take care with using MySQL user variables like this. In the SELECT statement as above, the assignment of the variables in the SELECT list happens after the ORDER BY, so we can get deterministic behavior.

该查询不处理user_id中的中断".为此,我们需要上一行的user_id值.我们可以将其保留在另一个用户变量中.操作的顺序是确定的,在覆盖上一行的user_id之前,我们需要小心进行累加.

That query does not handle "breaks" in user_id. To get that, we are going to need the value of user_id from the previous row. We can preserve that in another user variable. The order of the operations is deterministic, and we need to take care to do the accumulation BEFORE we overwrite the user_id from the previous row.

我们要么需要对列进行重新排序,以便使user_id出现在tot_dur之后(或者包括user_id列的第二个副本)

We either need to reorder the columns so that user_id appears after tot_dur (or include a second copy of the user_id column)

SELECT t.user_id
     , t.starttime
     , t.order_number
     , IF(t.order_number IS NULL,
         @tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
         @tot_dur := 0
       ) AS tot_dur
     , @prev_user_id := t.user_id AS prev_user_id
  FROM visit t
  JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
 ORDER BY t.user_id, t.start_time

user_idprev_user_id列中返回的值相同.可以删除该额外"列,也可以通过将查询(作为内联视图)包装在另一个查询中来对这些列进行重新排序,尽管这样做会降低性能:

The values returned in the user_id and prev_user_id columns is identical. That "extra" column could be removed, or the columns could be reordered by wrapping the query (as an inline view) in another query, although this comes at a performance cost:

SELECT v.user_id
     , v.starttime
     , v.order_number
     , v.tot_dur
  FROM (SELECT t.starttime
             , t.order_number
             , IF(t.order_number IS NULL,
                 @tot_dur := IF(@prev_user_id = t.user_id,@tot_dur,0) + t.visit_duration_seconds,
                 @tot_dur := 0
               ) AS tot_dur
             , @prev_user_id := t.user_id AS user_id
          FROM visit t
          JOIN (SELECT @tot_dur := 0, @prev_user_id := NULL) d
         ORDER BY t.user_id, t.start_time
       ) v

该查询表明MySQL可以返回指定的结果集.但是为了获得最佳性能,我们只想在内联视图中运行查询(别名为v),并在客户端处理行时对列的重新排序(将user_id列放在第一位)被检索.

That query demonstrates that it is possible for MySQL to return the specified resultset. But for optimum performance, we'd want to run just the query in the inline view (aliased as v), and handle the re-ordering of the columns (putting the user_id column first) on the client side, when the rows are retrieved.

其他两种常见方法是使用半联接和相关子查询,尽管这些方法在处理大型集合时可能会占用更多资源.

The other two common approaches are using a semi-join, and using a correlated subquery, although these approaches can be more resource intensive when processing large sets.

这篇关于如何做滚动总和,每行需要包括前几行的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆