行式突变的有效方法 [英] Efficient way to mutate rowwise

查看:64
本文介绍了行式突变的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框: dfUsers 购买使用以下代码生成:

I have two dataframes: dfUsers and purchases generated using the below code:

set.seed(1)
library(data.table)

dfUsers <- data.table(user = letters[1:5],
                      startDate = sample(seq.Date(from = as.Date('2016-01-01'), to = Sys.Date(), by = '1 day'), 3)
                      )

dfUsers$endDate <- dfUsers$startDate + sample(30:90,1)

purchases <- data.table(
  user = sample(letters[1:5], 500, replace = TRUE),
  purchaseDate = sample(seq.Date(from = as.Date('2016-01-01'), to = Sys.Date(), by = '1 day'), 500, replace = TRUE),
  amount = runif(50,300, 500)
)

对于每个用户,我希望将startDate和endDate之间的所有购买金额加在一起。

For each user I want to add together all the purchases during the period between the startDate and endDate.

我目前的方法是在函数上使用dplyr mutate,但是随着两个表的增长,速度非常慢。

My current approach is to use dplyr mutate over a function, but that's terribly slow as both tables grow.

我正在学习R,所以我想知道是否有更有效的方法来解决此类问题?

I'm learning R so I'm wondering if there's a more efficient way to approach a problem of this nature?

函数:

addPurchases <- function(u, startDate, endDate) {
  purchases[user == u & startDate <= purchaseDate & endDate >= purchaseDate, sum(amount)]
}

dplyr

library(dplyr)
dfUsers %>% 
  rowwise() %>%
  mutate(totalPurchase = addPurchases(user, startDate, endDate))


推荐答案

使用 data.table -合并两个表的解决方案并由用户计算总和

Solution using data.table - merge two tables and calculate sum by user:

library(data.table)
# Using OPs data
merge(dfUsers, 
      purchases, 
      "user")[purchaseDate >= startDate & purchaseDate <= endDate, 
              sum(amount), 
              user]
#    user       V1
# 1:    a 6929.469
# 2:    b 6563.416
# 3:    c 3607.794
# 4:    d 5591.748
# 5:    e 5727.622

这篇关于行式突变的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆