在访问数据帧的多个变量时,通过数据帧 R 行矢量化循环 [英] Vectorizing a loop through lines of data frame R while accessing multiple variables the dataframe

查看:18
本文介绍了在访问数据帧的多个变量时,通过数据帧 R 行矢量化循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

又一个apply问题.

我已经查看了很多关于 R 中 apply 函数系列的文档(并在我的工作中大量使用它们).我在下面定义了一个函数 myfun,我想将其应用于数据帧 inc 的每一行.我想我需要一些 apply(inc,1,myfun) 的变体我已经玩了一段时间了,但仍然不能完全理解它.我已经包含了一个循环,它完全实现了我想要做的事情......它对我的真实数据来说非常慢且效率低下,这些数据比我在此处包含的示例数据大得多.

I've reviewed a lot of documentation on the apply family of functions in R (and use them quite a bit in my work). I've defined a function myfun below which I want to apply to every row of the dataframe inc. I think I need some variant of apply(inc,1,myfun) I've played around with it for a while, but still can't quite get it. I've included a loop which achieves exactly what I want to do... it's just super slow and inefficient on my real data which is considerably larger than the sample data I've included here.

我希望这是一个快速修复,但我不能完全解决它...也许有特殊参数 ... 可以应用?

I expect it's a quick fix, but I can't quite put my finger on it... maybe something with special argument ... to apply?

以下代码的英文版本:我想查看 inc 数据框中的所有提交日期,并为每个日期查找 chgchg$Submit.Dateinc$Submit.Date 的某个范围内的地方.其中范围由 myfun

English version of what the code below does: I want to look at all the Submit Dates in the inc dataframe and find for each of these dates, how many rows in chg there are where chg$Submit.Date is within some range of the inc$Submit.Date. Where the range is controlled by fdays and bdays in myfun

chgdf <- data.frame(Submit.Date=as.Date(c("2013-09-27", "2013-09-4", "2013-08-01", "2013-06-24", '2013-05-29', '2013-08-20')), ID=c('001', '001', '001', '001', '001', '005'), stringsAsFactors=F)
incdf <- data.frame(Submit.Date=as.Date(c("2013-10-19", "2013-09-14", "2013-08-22", '2013-08-20')), ID=c('001', '001', '002', '006'), stringsAsFactors=F)

我想应用于数据框每一行的函数

myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
  fdays <- tdate+fdays
  bdays <- tdate-bdays
  chg2 <- chg[chg$ID==aid & chg$Submit.Date<fdays & chg$Submit.Date>bdays, ]
  ret <- nrow(chg2)
  return(ret)
}

适用于一行 inc 数据框

tdate <- inc[inc$ID==aid, 'Submit.Date'][1]
myfun(tdate, aid='001', bdays=50, fdays=100)

工作但很慢......使用完整的数据集

inc$chgw <- 0
for(i in 1:nrow(inc)){
  aid <- inc$ID[i]
  tdate <- inc$Submit.Date[i]
  inc$chgw[i] <- myfun(tdate, aid, bdays=50, fdays=100)
}

推荐答案

首先,当你调用 apply 时所有的值都被强制转换为字符串,所以你需要转换 tdate在使用它之前.否则,您将尝试向字符串添加天数:

First, when you call apply all values are coerced to strings, so you need to convert tdate before using it. Otherwise you're trying to add days to a string:

tdate <- as.Date(tdate)
fdays <- tdate+fdays
bdays <- tdate-bdays

其次,您调用 apply(inc, 1, myfun).请注意,在这种情况下,您将单个参数传递给 myfun(整行),而不是 myfun 应该接收的多个参数.

Second, you call apply(inc, 1, myfun). Note that in that case you're passing a single parameter to myfun (the whole row), and not several parameters as myfun is supposed to receive.

解决方案 1: 更改您的函数以接收一整行数据框并像您一样调用:

Solution 1: Change your function to receive a whole row of the dataframe and call as you did:

myfun <- function(row, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
  tdate <- as.Date(row[1])
  fdays <- tdate+fdays
  bdays <- tdate-bdays
  chgdf2 <- chgdf[chgdf$ID==row[2] & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
  ret <- nrow(chgdf2)
  return(ret)
}
> apply(inc, 1, myfun)
[1] 1 2 0 0

方案二:使用函数调用中的所有参数调用apply:

Solution 2: Call apply using all parameters in the function call:

myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
  fdays <- tdate+fdays
  bdays <- tdate-bdays
  chgdf2 <- chgdf[chgdf$ID==aid & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
  ret <- nrow(chgdf2)
  return(ret)
}
> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2]))
[1] 1 2 0 0

我个人更喜欢第二种解决方案,因为它使您可以在 myfun 中更改其他参数的默认值:

I personally prefer the second solution, because it gives you the possibility to change the default values of your other parameters in myfun:

> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2], bdays=50, fdays=50))
[1] 2 3 0 0

这篇关于在访问数据帧的多个变量时,通过数据帧 R 行矢量化循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆