我如何跟踪过去 6 个月从账户发送的总交易金额? [英] How can I keep track of total transaction amount sent from an account each last 6 month?

查看:28
本文介绍了我如何跟踪过去 6 个月从账户发送的总交易金额?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的交易数据

data 

id          from    to          date        amount  
<int>       <fctr>  <fctr>      <date>      <dbl>
19521       6644    6934        2005-01-01  700.0
19524       6753    8456        2005-01-01  600.0
19523       9242    9333        2005-01-01  1000.0
…           …       …           …           …
1055597     9866    9736        2010-12-31  278.9
1053519     9868    8644        2010-12-31  242.8
1052790     9869    8399        2010-12-31  372.2

现在,对于 from 列中的每个不同帐户,我想跟踪他们在过去 6 个月内在进行交易时发送的交易金额,因此我想根据到进行特定交易的交易日期.

Now for each distinct account in from column, I want to keep track of how much transaction amount they sent over last 6 month at the time the transaction was made and so I want to do it according to the transaction date at which the particular transaction was made.

为了更好地查看它,我将在这里只考虑帐户 5370.那么,让我们考虑以下数据:

To see it better I will only consider the account 5370 here. So, then let's consider the following data:

id          from    to          date        amount  
<int>       <fctr>  <fctr>      <date>      <dbl>
18529       5370    9356        2005-05-31  24.4
13742       5370    5605        2005-08-05  7618.0
9913        5370    8567        2005-09-12  21971.0
2557        5370    5636        2005-11-12  2921.0
18669       5370    8933        2005-11-30  169.2
35900       5370    8483        2006-01-31  71.5
51341       5370    7626        2006-04-11  4214.0
83324       5370    9676        2006-08-31  261.1
100277      5370    9105        2006-10-31  182.0
103444      5370    9772        2006-11-08  16927.0

第一笔交易 5370 发生在 2005-05-31.所以在此之前没有任何记录.这就是为什么这是 5370 的起始日期点(因此,每个不同的帐户将根据他们进行第一次交易的日期有自己的起始日期点).因此,5370在过去6个月内发送的总交易金额仅为24.4.转到 5370 的下一笔交易,出现在 2005-08-05 上的第二笔交易.当时,5370在过去6个月发送的总交易金额为24.4 + 7618.0 = 7642.4.因此,输出应如下所示:

The very first transaction 5370 made was on 2005-05-31. So there's no any record before that. That's why this is the starting date point for 5370(So, each distinct account will have their own starting date point based on which date they made their first transaction). Thus, total transaction amount sent by 5370 in last 6 month at that time was just 24.4. Going to the next transaction of 5370, there comes the second transaction made on 2005-08-05. At that time, total transaction amount sent by 5370 in last 6 month was 24.4 + 7618.0 = 7642.4. So, the output should be as follows:

id          from    to          date        amount     total_trx_amount_sent_in_last_6month_by_from
<int>       <fctr>  <fctr>      <date>      <dbl>      <dbl>
18529       5370    9356        2005-05-31  24.4       24.4 
13742       5370    5605        2005-08-05  7618.0     (24.4+7618.0)=7642.4
9913        5370    8567        2005-09-12  21971.0    (24.4+7618.0+21971.0)=29613.4
2557        5370    5636        2005-11-12  2921.0     (24.4+7618.0+21971.0+2921.0)=32534.4
18669       5370    8933        2005-11-30  169.2      (7618.0+21971.0+2921.0+169.2)=32679.2
35900       5370    8483        2006-01-31  71.5       (7618.0+21971.0+2921.0+169.2+71.5)=32750.7
51341       5370    7626        2006-04-11  4214.0     (2921.0+169.2+71.5+4214.0)=7375.7
83324       5370    9676        2006-08-31  261.1      (4214.0+261.1)=4475.1
100277      5370    9105        2006-10-31  182.0      (261.1+182.0)=443.1
103444      5370    9772        2006-11-08  16927.0    (261.1+182.0+16927.0)=17370.1

为了计算,我从每行的交易日期中减去 180 天(大约 6 个月).这就是我选择应汇总的金额的方式.

For the calculations, I subtracted 180 days(approx. 6 months) from the transaction date on each line. That's how I chose which amounts should be summed up.

那么,考虑到所有不同的帐户,我如何针对整个数据实现这一点?

So, how can I achieve this for the whole data, considering all the distinct accounts?

PS:我的数据有 100 万行,因此该解决方案也应该在大型数据集上运行得更快.

PS: My data has 1 million rows so the solution also should run faster on a large dataset.

推荐答案

使用 dplyr 的方法可能是:

A way using dplyr could be :

library(dplyr)
df %>%
  group_by(from) %>%
  mutate(total_trx = purrr::map_dbl(date, 
                     ~sum(amount[between(date, .x - 180, .x)])))

#      id  from    to date        amount total_trx
#    <int> <int> <int> <date>       <dbl>     <dbl>
# 1  18529  5370  9356 2005-05-31    24.4      24.4
# 2  13742  5370  5605 2005-08-05  7618      7642. 
# 3   9913  5370  8567 2005-09-12 21971     29613. 
# 4   2557  5370  5636 2005-11-12  2921     32534. 
# 5  18669  5370  8933 2005-11-30   169.    32679. 
# 6  35900  5370  8483 2006-01-31    71.5   32751. 
# 7  51341  5370  7626 2006-04-11  4214      7376. 
# 8  83324  5370  9676 2006-08-31   261.     4475. 
# 9 100277  5370  9105 2006-10-31   182       443. 
#10 103444  5370  9772 2006-11-08 16927     17370. 


如果您的数据量很大,您可以在 data.table 中使用上述方法,这可能会很有效.


If you are data is huge you can use the above approach in data.table which might be efficient.

library(data.table)
setDT(df)[, total_trx := sapply(date, function(x) 
                         sum(amount[between(date, x - 180, x)])), from]

这篇关于我如何跟踪过去 6 个月从账户发送的总交易金额?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆