行式突变的有效方法 [英] Efficient way to mutate rowwise
本文介绍了行式突变的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个数据框: dfUsers
和购买
使用以下代码生成:
I have two dataframes: dfUsers
and purchases
generated using the below code:
set.seed(1)
library(data.table)
dfUsers <- data.table(user = letters[1:5],
startDate = sample(seq.Date(from = as.Date('2016-01-01'), to = Sys.Date(), by = '1 day'), 3)
)
dfUsers$endDate <- dfUsers$startDate + sample(30:90,1)
purchases <- data.table(
user = sample(letters[1:5], 500, replace = TRUE),
purchaseDate = sample(seq.Date(from = as.Date('2016-01-01'), to = Sys.Date(), by = '1 day'), 500, replace = TRUE),
amount = runif(50,300, 500)
)
对于每个用户,我希望将startDate和endDate之间的所有购买金额加在一起。
For each user I want to add together all the purchases during the period between the startDate and endDate.
我目前的方法是在函数上使用dplyr mutate,但是随着两个表的增长,速度非常慢。
My current approach is to use dplyr mutate over a function, but that's terribly slow as both tables grow.
我正在学习R,所以我想知道是否有更有效的方法来解决此类问题?
I'm learning R so I'm wondering if there's a more efficient way to approach a problem of this nature?
函数:
addPurchases <- function(u, startDate, endDate) {
purchases[user == u & startDate <= purchaseDate & endDate >= purchaseDate, sum(amount)]
}
dplyr
链
library(dplyr)
dfUsers %>%
rowwise() %>%
mutate(totalPurchase = addPurchases(user, startDate, endDate))
推荐答案
使用 data.table
-合并
两个表的解决方案并由用户
计算总和
:
Solution using data.table
- merge
two tables and calculate sum
by user
:
library(data.table)
# Using OPs data
merge(dfUsers,
purchases,
"user")[purchaseDate >= startDate & purchaseDate <= endDate,
sum(amount),
user]
# user V1
# 1: a 6929.469
# 2: b 6563.416
# 3: c 3607.794
# 4: d 5591.748
# 5: e 5727.622
这篇关于行式突变的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文