为什么“vapply"比“sapply"更安全? [英] Why is `vapply` safer than `sapply`?

查看:43
本文介绍了为什么“vapply"比“sapply"更安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文档说

vapplysapply 类似,但具有预先指定的返回值类型,因此使用起来更安全[...].

vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer [...] to use.

您能否详细说明为什么它通常更安全,或者提供示例?

Could you please elaborate as to why it is generally safer, maybe providing examples?

P.S.:我知道答案并且我已经倾向于避免sapply.我只是希望这里有一个很好的答案,这样我就可以向我的同事指出它.请不要阅读手册"答案.

P.S.: I know the answer and I already tend to avoid sapply. I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer.

推荐答案

如前所述,vapply 做了两件事:

As has already been noted, vapply does two things:

  • 速度略有提升
  • 通过提供有限的返回类型检查来提高一致性.

第二点是更大的优势,因为它有助于在错误发生之前捕获错误并导致更健壮的代码.此返回值检查可以通过使用 sapply 后跟 stopifnot 来单独完成,以确保返回值与您预期的一致,但是 vapply 更容易一些(如果有更多限制,因为自定义错误检查代码可以检查边界内的值等).

The second point is the greater advantage, as it helps catch errors before they happen and leads to more robust code. This return value checking could be done separately by using sapply followed by stopifnot to make sure that the return values are consistent with what you expected, but vapply is a little easier (if more limited, since custom error checking code could check for values within bounds, etc.).

以下是 vapply 的示例,可确保您的结果符合预期.这与我刚刚在 PDF 抓取时所做的事情相似,其中 findD 将使用 匹配原始文本数据中的模式(例如,我有一个按实体split 的列表,以及一个匹配内部地址的正则表达式每个实体.有时,PDF 会被乱序转换,一个实体会有两个地址,这会造成不良影响).

Here's an example of vapply ensuring your result is as expected. This parallels something I was just working on while PDF scraping, where findD would use a regex to match a pattern in raw text data (e.g. I'd have a list that was split by entity, and a regex to match addresses within each entity. Occasionally the PDF had been converted out-of-order and there would be two addresses for an entity, which caused badness).

> input1 <- list( letters[1:5], letters[3:12], letters[c(5,2,4,7,1)] )
> input2 <- list( letters[1:5], letters[3:12], letters[c(2,5,4,7,15,4)] )
> findD <- function(x) x[x=="d"]
> sapply(input1, findD )
[1] "d" "d" "d"
> sapply(input2, findD )
[[1]]
[1] "d"

[[2]]
[1] "d"

[[3]]
[1] "d" "d"

> vapply(input1, findD, "" )
[1] "d" "d" "d"
> vapply(input2, findD, "" )
Error in vapply(input2, findD, "") : values must be length 1,
 but FUN(X[[3]]) result is length 2

因为 input2 的第三个元素中有两个 d,所以 vapply 产生错误.但是 sapply 将输出的类从字符向量更改为列表,这可能会破坏下游代码.

Because two there are two d's in the third element of input2, vapply produces an error. But sapply changes the class of the output from a character vector to a list, which could break code downstream.

正如我告诉我的学生,成为程序员的一部分是改变你的心态,从错误令人讨厌"转变为错误令人讨厌".到错误是我的朋友".

As I tell my students, part of becoming a programmer is changing your mindset from "errors are annoying" to "errors are my friend."

零长度输入
一个相关的观点是,如果输入长度为零,sapply 将始终返回一个空列表,无论输入类型如何.比较:

Zero length inputs
One related point is that if the input length is zero, sapply will always return an empty list, regardless of the input type. Compare:

sapply(1:5, identity)
## [1] 1 2 3 4 5
sapply(integer(), identity)
## list()    
vapply(1:5, identity, integer(1))
## [1] 1 2 3 4 5
vapply(integer(), identity, integer(1))
## integer(0)

使用 vapply,您可以保证获得特定类型的输出,因此您无需为零长度输入编写额外的检查.

With vapply, you are guaranteed to have a particular type of output, so you don't need to write extra checks for zero length inputs.

基准

vapply 可以快一点,因为它已经知道它应该期待什么格式的结果.

vapply can be a bit faster because it already knows what format it should be expecting the results in.

input1.long <- rep(input1,10000)

library(microbenchmark)
m <- microbenchmark(
  sapply(input1.long, findD ),
  vapply(input1.long, findD, "" )
)
library(ggplot2)
library(taRifx) # autoplot.microbenchmark is moving to the microbenchmark package in the next release so this should be unnecessary soon
autoplot(m)

这篇关于为什么“vapply"比“sapply"更安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆