如何跟踪最近6个月每个帐户发送的交易总额? [英] How can I keep track of total transaction amount sent from an account each last 6 month?
问题描述
这是我的交易数据
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1055597 9866 9736 2010-12-31 278.9
1053519 9868 8644 2010-12-31 242.8
1052790 9869 8399 2010-12-31 372.2
现在,对于 from
列中的每个不同帐户,我想跟踪他们在进行交易时在过去6个月中发送了多少交易金额,因此我想根据到进行特定交易的交易日期.
Now for each distinct account in from
column, I want to keep track of how much transaction amount they sent over last 6 month at the time the transaction was made and so I want to do it according to the transaction date at which the particular transaction was made.
为了更好看,我将在这里只考虑帐户 5370
.因此,让我们考虑以下数据:
To see it better I will only consider the account 5370
here. So, then let's consider the following data:
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
18529 5370 9356 2005-05-31 24.4
13742 5370 5605 2005-08-05 7618.0
9913 5370 8567 2005-09-12 21971.0
2557 5370 5636 2005-11-12 2921.0
18669 5370 8933 2005-11-30 169.2
35900 5370 8483 2006-01-31 71.5
51341 5370 7626 2006-04-11 4214.0
83324 5370 9676 2006-08-31 261.1
100277 5370 9105 2006-10-31 182.0
103444 5370 9772 2006-11-08 16927.0
第一个交易 5370
是在 2005-05-31
上进行的.因此,在此之前没有任何记录.这就是为什么这是 5370
的开始日期点的原因(因此,每个不同的帐户都将基于进行首次交易的日期而拥有自己的开始日期点).因此,当时 5370
在过去6个月发送的总交易金额仅为24.4.转到 5370
的下一个事务,在 2005-08-05
上进行第二次事务.当时, 5370
在过去6个月中发送的总交易金额为 24.4 + 7618.0 = 7642.4
.因此,输出应如下所示:
The very first transaction 5370
made was on 2005-05-31
. So there's no any record before that. That's why this is the starting date point for 5370
(So, each distinct account will have their own starting date point based on which date they made their first transaction). Thus, total transaction amount sent by 5370
in last 6 month at that time was just 24.4. Going to the next transaction of 5370
, there comes the second transaction made on 2005-08-05
. At that time, total transaction amount sent by 5370
in last 6 month was 24.4 + 7618.0 = 7642.4
. So, the output should be as follows:
id from to date amount total_trx_amount_sent_in_last_6month_by_from
<int> <fctr> <fctr> <date> <dbl> <dbl>
18529 5370 9356 2005-05-31 24.4 24.4
13742 5370 5605 2005-08-05 7618.0 (24.4+7618.0)=7642.4
9913 5370 8567 2005-09-12 21971.0 (24.4+7618.0+21971.0)=29613.4
2557 5370 5636 2005-11-12 2921.0 (24.4+7618.0+21971.0+2921.0)=32534.4
18669 5370 8933 2005-11-30 169.2 (7618.0+21971.0+2921.0+169.2)=32679.2
35900 5370 8483 2006-01-31 71.5 (7618.0+21971.0+2921.0+169.2+71.5)=32750.7
51341 5370 7626 2006-04-11 4214.0 (2921.0+169.2+71.5+4214.0)=7375.7
83324 5370 9676 2006-08-31 261.1 (4214.0+261.1)=4475.1
100277 5370 9105 2006-10-31 182.0 (261.1+182.0)=443.1
103444 5370 9772 2006-11-08 16927.0 (261.1+182.0+16927.0)=17370.1
为了进行计算,我从每一行的交易日期中减去了180天(约6个月).这就是我选择应该汇总的金额的方式.
For the calculations, I subtracted 180 days(approx. 6 months) from the transaction date on each line. That's how I chose which amounts should be summed up.
那么,考虑到所有不同的帐户,我如何才能针对整个数据实现这一目标?
So, how can I achieve this for the whole data, considering all the distinct accounts?
PS:我的数据有100万行,因此该解决方案在大型数据集上的运行速度也应更快.
PS: My data has 1 million rows so the solution also should run faster on a large dataset.
推荐答案
使用 dplyr
的方法可能是:
library(dplyr)
df %>%
group_by(from) %>%
mutate(total_trx = purrr::map_dbl(date,
~sum(amount[between(date, .x - 180, .x)])))
# id from to date amount total_trx
# <int> <int> <int> <date> <dbl> <dbl>
# 1 18529 5370 9356 2005-05-31 24.4 24.4
# 2 13742 5370 5605 2005-08-05 7618 7642.
# 3 9913 5370 8567 2005-09-12 21971 29613.
# 4 2557 5370 5636 2005-11-12 2921 32534.
# 5 18669 5370 8933 2005-11-30 169. 32679.
# 6 35900 5370 8483 2006-01-31 71.5 32751.
# 7 51341 5370 7626 2006-04-11 4214 7376.
# 8 83324 5370 9676 2006-08-31 261. 4475.
# 9 100277 5370 9105 2006-10-31 182 443.
#10 103444 5370 9772 2006-11-08 16927 17370.
如果您的数据量巨大,则可以在 data.table
中使用上述方法,这可能是有效的.
If you are data is huge you can use the above approach in data.table
which might be efficient.
library(data.table)
setDT(df)[, total_trx := sapply(date, function(x)
sum(amount[between(date, x - 180, x)])), from]
这篇关于如何跟踪最近6个月每个帐户发送的交易总额?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!