为什么 `[` 比 `subset` 好? [英] Why is `[` better than `subset`?

查看:26
本文介绍了为什么 `[` 比 `subset` 好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我需要过滤一个data.frame,即提取满足特定条件的行时,我更喜欢使用subset函数:

When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:

subset(airquality, Month == 8 & Temp > 90)

而不是 [ 函数:

airquality[airquality$Month == 8 & airquality$Temp > 90, ]

我偏爱的主要原因有两个:

There are two main reasons for my preference:

  1. 我发现代码从左到右读起来更好.即使对 R 一无所知的人也能知道上面的 subset 语句在做什么.

因为列可以被称为 select 表达式中的变量,所以我可以节省几次按键操作.在我上面的例子中,我只需要用 subset 输入 airquality 一次,但是用 [ 输入 3 次.

Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type airquality once with subset, but three times with [.

所以我过着幸福的生活,到处使用 subset,因为它更短,读起来更好,甚至向我的 R 程序员伙伴们宣传它的美.但昨天我的世界分崩离析了.在阅读 subset 文档时,我注意到这一部分:

So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:

警告

这是一个旨在交互使用的便利功能.对于编程,最好使用 [ 之类的标准子集函数,特别是参数子集的非标准评估可能会产生意想不到的后果.

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

有人可以帮助澄清作者的意思吗?

Could someone help clarify what the authors mean?

首先,交互式使用"是什么意思?我知道什么是交互式会话,而不是在 BATCH 模式下运行的脚本,但我不知道它应该有什么不同.

First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.

那么,能否请您解释一下参数子集的非标准评估"?以及为什么它是危险的,也许提供一个例子?

Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?

推荐答案

@James 在评论中很好地回答了这个问题,指出 Hadley Wickham 对 subset 的危险性的出色解释(以及类似的功能)[此处].去读吧!

This question was answered in well in the comments by @James, pointing to an excellent explanation by Hadley Wickham of the dangers of subset (and functions like it) [here]. Go read it!

读起来有点长,所以在这里记录下哈德利使用的最直接解决可能会出什么问题?"这个问题的例子可能会有所帮助:

It's a somewhat long read, so it may be helpful to record here the example that Hadley uses that most directly addresses the question of "what can go wrong?":

Hadley 建议使用以下示例:假设我们要使用以下函数对数据帧进行子集化并重新排序:

Hadley suggests the following example: suppose we want to subset and then reorder a data frame using the following functions:

scramble <- function(x) x[sample(nrow(x)), ]

subscramble <- function(x, condition) {
  scramble(subset(x, condition))
}

subscramble(mtcars, cyl == 4)

这将返回错误:

eval(expr,envir, enclos) 中的错误:找不到对象 'cyl'

Error in eval(expr, envir, enclos) : object 'cyl' not found

因为 R 不再知道"在哪里可以找到名为cyl"的对象.他还指出,如果在全局环境中偶然有一个名为cyl"的对象,就会发生真正奇怪的事情:

because R no longer "knows" where to find the object called 'cyl'. He also points out the truly bizarre stuff that can happen if by chance there is an object called 'cyl' in the global environment:

cyl <- 4
subscramble(mtcars, cyl == 4)

cyl <- sample(10, 100, rep = T)
subscramble(mtcars, cyl == 4)

(运行它们并亲眼看看,这很疯狂.)

(Run them and see for yourself, it's pretty crazy.)

这篇关于为什么 `[` 比 `subset` 好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆