为什么"vapply"比"sapply"更安全? [英] Why is `vapply` safer than `sapply`?

查看:291
本文介绍了为什么"vapply"比"sapply"更安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文档说

vapplysapply类似,但是具有预先指定的返回值类型,因此可以更安全地使用.

vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer [...] to use.

您能详细说明为什么它通常更安全吗?也许会提供一些示例?

Could you please elaborate as to why it is generally safer, maybe providing examples?

P.S .:我知道答案,而且我已经倾向于避免使用sapply.我只是希望在SO上有一个不错的答案,这样我可以指出我的同事.请不要回答阅读手册".

P.S.: I know the answer and I already tend to avoid sapply. I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer.

推荐答案

正如已经提到的,vapply做两件事:

As has already been noted, vapply does two things:

  • 速度略有改善
  • 通过提供有限的返回类型检查来提高一致性.

第二点是更大的优势,因为它有助于在错误发生之前及时发现并导致更健壮的代码.可以通过使用sapply后跟stopifnot来分别完成返回值检查,以确保返回值与您期望的值一致,但是vapply稍微容易一些(如果有更多限制,因为自定义错误检查代码可以检查范围内的值,等等.

The second point is the greater advantage, as it helps catch errors before they happen and leads to more robust code. This return value checking could be done separately by using sapply followed by stopifnot to make sure that the return values are consistent with what you expected, but vapply is a little easier (if more limited, since custom error checking code could check for values within bounds, etc.).

这是vapply的示例,可确保您的结果符合预期.这与我在PDF抓取过程中正在处理的事情类似,其中findD将使用以匹配原始文本数据中的模式(例如,我有一个按实体split列出的列表,以及一个正则表达式以匹配每个实体中的地址.有时,PDF已被转换出顺序,并且一个实体有两个地址,这会导致不良).

Here's an example of vapply ensuring your result is as expected. This parallels something I was just working on while PDF scraping, where findD would use a regex to match a pattern in raw text data (e.g. I'd have a list that was split by entity, and a regex to match addresses within each entity. Occasionally the PDF had been converted out-of-order and there would be two addresses for an entity, which caused badness).

> input1 <- list( letters[1:5], letters[3:12], letters[c(5,2,4,7,1)] )
> input2 <- list( letters[1:5], letters[3:12], letters[c(2,5,4,7,15,4)] )
> findD <- function(x) x[x=="d"]
> sapply(input1, findD )
[1] "d" "d" "d"
> sapply(input2, findD )
[[1]]
[1] "d"

[[2]]
[1] "d"

[[3]]
[1] "d" "d"

> vapply(input1, findD, "" )
[1] "d" "d" "d"
> vapply(input2, findD, "" )
Error in vapply(input2, findD, "") : values must be length 1,
 but FUN(X[[3]]) result is length 2

正如我告诉我的学生们,成为一名程序员的一部分正在将您的思维方式从错误使人烦恼"变为错误是我的朋友".

As I tell my students, part of becoming a programmer is changing your mindset from "errors are annoying" to "errors are my friend."

零长度输入
一个相关的一点是,如果输入长度为零,则无论输入类型如何,sapply都将始终返回一个空列表.比较:

Zero length inputs
One related point is that if the input length is zero, sapply will always return an empty list, regardless of the input type. Compare:

sapply(1:5, identity)
## [1] 1 2 3 4 5
sapply(integer(), identity)
## list()    
vapply(1:5, identity)
## [1] 1 2 3 4 5
vapply(integer(), identity)
## integer(0)

使用vapply,可以确保您具有特定的输出类型,因此无需为零长度输入编写额外的检查.

With vapply, you are guaranteed to have a particular type of output, so you don't need to write extra checks for zero length inputs.

基准

vapply可能会更快一些,因为它已经知道期望结果采用哪种格式.

vapply can be a bit faster because it already knows what format it should be expecting the results in.

input1.long <- rep(input1,10000)

library(microbenchmark)
m <- microbenchmark(
  sapply(input1.long, findD ),
  vapply(input1.long, findD, "" )
)
library(ggplot2)
library(taRifx) # autoplot.microbenchmark is moving to the microbenchmark package in the next release so this should be unnecessary soon
autoplot(m)

这篇关于为什么"vapply"比"sapply"更安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆