R〜用户定义函数的向量化 [英] R ~ Vectorization of a user defined function

查看:45
本文介绍了R〜用户定义函数的向量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个函数来计算工作日的数量(减去周末,以及其他本地银行假日的向量),但是我所遇到的问题仅通过计算工作日的数量就可以更简单地说明.工作日.

I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays.

这是一个可以给出两个日期之间的工作日数的函数:

Here is a function that will give the number of weekdays between two dates:

removeWeekends <- function(end, start){

  range <- as.Date(start:end, "1970-01-01")

  range<- range[sapply(range, function(x){
                                if(!chron::is.weekend(x)){
                                  return(TRUE)
                                }else{
                                  return(FALSE)
                                }
                              })]

  return(NROW(range))

}

为每个参数指定单个日期时,这是可行的:

Which works when it is given a single date for each argument:

removeWeekends(as.Date("2018-05-08"), as.Date("2018-06-08"))
#[1] 24

但是当从数据帧中给它两个向量时,它会失败:

But when it is given a two vectors from a data frame it fails:

one <- as.Date("2017-01-01"):as.Date("2017-01-08")
two <- as.Date("2018-06-08"):as.Date("2018-06-15")
df <- data.frame(one, two)
removeWeekends(df$two, df$one)
#[1] 375
#Warning messages:
#1: In start:end : numerical expression has 8 elements: only the first used
#2: In start:end : numerical expression has 8 elements: only the first used

我也尝试过(由于语法似乎不正确,我猜这是行不通的):

I've also tried (which I guessed would not work as the syntax seems off):

lapply(df, removeWeekends, df$two, df$one)
#Error in FUN(X[[i]], ...) : unused argument (17167:17174)

并且:

lapply(df[,c("two", "one")], removeWeekends)
#Error in as.Date(start:end, "1970-01-01") :   argument "start" is missing,
# with no default 

我认为这是我对向量化概念的误解.

I'm assuming it is me misunderstanding the concept of vectorization.

我仅有的另一个想法是将函数嵌套在条件中以查看它是否是向量,然后在条件上调用apply函数,尽管我不太确定如何构造它./p>

The only other idea I've got is nesting the function within a conditional to see if it's a vector, then calling an apply function on it if it is although I'm not quite sure how I would structure that either.

推荐答案

如果要完全矢量化,则需要开箱即用. chron :: is.weekend 所做的只是检查特定时段内的星期日和星期六有多少天.我们可以用向量化的方式自己计算,因为每个星期都有两个周末,而唯一棘手的部分是剩菜剩饭.

If you want to fully vectorize this, you will need to think out of the box. What chron::is.weekend does is just checking how many days were Sundays and Saturdays in a certain time preiod. We can calculate this ourselves in a vectorized way because each week has two weekends, and the only tricky part are the left overs.

我写了以下函数来实现这一目的,尽管我敢肯定它可以改进

I wrote the following function to achieve this, though I'm sure it could be improved

frw <- function(two, one) {

  diff_d <- two - one ## difference in days
  l_d <- (two + 4L) %% 7L + 1L ## last day of the remainder 
  weeks <- diff_d %/% 7L ## number of weeks between
  days <- diff_d %% 7L ## days left

  ## calculate how many work days left
  diff_d - 
    ((weeks * 2L) + ((l_d - days < 1) + ((l_d - days < 2) - (l_d == 1L))) +
    (l_d %in% c(1L, 7L))) + 1L

}

您可以按以下方式运行

frw(two, one)
## [1] 375 375 374 374 374 374 374 375

它远远快于 mapply 版本(几乎是即时版本),在更大的数据上有一些基准:

It is by far faster than the mapply version (almost instant), some benchmark on a bigger data:

one <- as.Date("2017-01-01"):as.Date("2030-01-08")
two <- as.Date("2017-05-01"):as.Date("2030-05-08")
df <- data.frame(one, two)

system.time(res_mapply <- vremoveWeekends(df$two, df$one)) # taken from the other answer
#  user  system elapsed 
# 76.46    0.06   77.25 

system.time(res_vectorized <- frw(df$two, df$one))
# user  system elapsed 
#    0       0       0

identical(res_mapply, res_vectorized)
# [1] TRUE

这篇关于R〜用户定义函数的向量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆