我如何跟踪过去 6 个月从账户发送的总交易金额? [英] How can I keep track of total transaction amount sent from an account each last 6 month?
问题描述
这是我的交易数据
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1055597 9866 9736 2010-12-31 278.9
1053519 9868 8644 2010-12-31 242.8
1052790 9869 8399 2010-12-31 372.2
现在,对于 from
列中的每个不同帐户,我想跟踪他们在过去 6 个月内在进行交易时发送的交易金额,因此我想根据到进行特定交易的交易日期.
Now for each distinct account in from
column, I want to keep track of how much transaction amount they sent over last 6 month at the time the transaction was made and so I want to do it according to the transaction date at which the particular transaction was made.
为了更好地查看它,我将在这里只考虑帐户 5370
.那么,让我们考虑以下数据:
To see it better I will only consider the account 5370
here. So, then let's consider the following data:
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
18529 5370 9356 2005-05-31 24.4
13742 5370 5605 2005-08-05 7618.0
9913 5370 8567 2005-09-12 21971.0
2557 5370 5636 2005-11-12 2921.0
18669 5370 8933 2005-11-30 169.2
35900 5370 8483 2006-01-31 71.5
51341 5370 7626 2006-04-11 4214.0
83324 5370 9676 2006-08-31 261.1
100277 5370 9105 2006-10-31 182.0
103444 5370 9772 2006-11-08 16927.0
第一笔交易 5370
发生在 2005-05-31
.所以在此之前没有任何记录.这就是为什么这是 5370
的起始日期点(因此,每个不同的帐户将根据他们进行第一次交易的日期有自己的起始日期点).因此,5370
在过去6个月内发送的总交易金额仅为24.4.转到 5370
的下一笔交易,出现在 2005-08-05
上的第二笔交易.当时,5370
在过去6个月发送的总交易金额为24.4 + 7618.0 = 7642.4
.因此,输出应如下所示:
The very first transaction 5370
made was on 2005-05-31
. So there's no any record before that. That's why this is the starting date point for 5370
(So, each distinct account will have their own starting date point based on which date they made their first transaction). Thus, total transaction amount sent by 5370
in last 6 month at that time was just 24.4. Going to the next transaction of 5370
, there comes the second transaction made on 2005-08-05
. At that time, total transaction amount sent by 5370
in last 6 month was 24.4 + 7618.0 = 7642.4
. So, the output should be as follows:
id from to date amount total_trx_amount_sent_in_last_6month_by_from
<int> <fctr> <fctr> <date> <dbl> <dbl>
18529 5370 9356 2005-05-31 24.4 24.4
13742 5370 5605 2005-08-05 7618.0 (24.4+7618.0)=7642.4
9913 5370 8567 2005-09-12 21971.0 (24.4+7618.0+21971.0)=29613.4
2557 5370 5636 2005-11-12 2921.0 (24.4+7618.0+21971.0+2921.0)=32534.4
18669 5370 8933 2005-11-30 169.2 (7618.0+21971.0+2921.0+169.2)=32679.2
35900 5370 8483 2006-01-31 71.5 (7618.0+21971.0+2921.0+169.2+71.5)=32750.7
51341 5370 7626 2006-04-11 4214.0 (2921.0+169.2+71.5+4214.0)=7375.7
83324 5370 9676 2006-08-31 261.1 (4214.0+261.1)=4475.1
100277 5370 9105 2006-10-31 182.0 (261.1+182.0)=443.1
103444 5370 9772 2006-11-08 16927.0 (261.1+182.0+16927.0)=17370.1
为了计算,我从每行的交易日期中减去 180 天(大约 6 个月).这就是我选择应汇总的金额的方式.
For the calculations, I subtracted 180 days(approx. 6 months) from the transaction date on each line. That's how I chose which amounts should be summed up.
那么,考虑到所有不同的帐户,我如何针对整个数据实现这一点?
So, how can I achieve this for the whole data, considering all the distinct accounts?
PS:我的数据有 100 万行,因此该解决方案也应该在大型数据集上运行得更快.
PS: My data has 1 million rows so the solution also should run faster on a large dataset.
推荐答案
使用 dplyr
的方法可能是:
A way using dplyr
could be :
library(dplyr)
df %>%
group_by(from) %>%
mutate(total_trx = purrr::map_dbl(date,
~sum(amount[between(date, .x - 180, .x)])))
# id from to date amount total_trx
# <int> <int> <int> <date> <dbl> <dbl>
# 1 18529 5370 9356 2005-05-31 24.4 24.4
# 2 13742 5370 5605 2005-08-05 7618 7642.
# 3 9913 5370 8567 2005-09-12 21971 29613.
# 4 2557 5370 5636 2005-11-12 2921 32534.
# 5 18669 5370 8933 2005-11-30 169. 32679.
# 6 35900 5370 8483 2006-01-31 71.5 32751.
# 7 51341 5370 7626 2006-04-11 4214 7376.
# 8 83324 5370 9676 2006-08-31 261. 4475.
# 9 100277 5370 9105 2006-10-31 182 443.
#10 103444 5370 9772 2006-11-08 16927 17370.
如果您的数据量很大,您可以在 data.table
中使用上述方法,这可能会很有效.
If you are data is huge you can use the above approach in data.table
which might be efficient.
library(data.table)
setDT(df)[, total_trx := sapply(date, function(x)
sum(amount[between(date, x - 180, x)])), from]
这篇关于我如何跟踪过去 6 个月从账户发送的总交易金额?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!