为什么`[`比`subset`好? [英] Why is `[` better than `subset`?

查看:25
本文介绍了为什么`[`比`subset`好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我需要过滤一个data.frame,即提取满足一定条件的行时,我更喜欢使用subset函数:

When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:

subset(airquality, Month == 8 & Temp > 90)

而不是[函数:

airquality[airquality$Month == 8 & airquality$Temp > 90, ]

我偏爱的主要原因有两个:

There are two main reasons for my preference:

  1. 我发现代码从左到右读起来更好.即使对 R 一无所知的人也能知道上面的 subset 语句在做什么.

因为在 select 表达式中可以将列称为变量,所以我可以节省一些击键.在上面的示例中,我只需要使用 subset 键入一次 airquality,但使用 [ 键入 3 次.

Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type airquality once with subset, but three times with [.

所以我过着幸福的生活,到处使用 subset,因为它更短且可读性更好,甚至向我的 R 程序员同事宣传它的美感.但是昨天我的世界分崩离析.在阅读 subset 文档时,我注意到这一部分:

So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:

警告

这是一个旨在以交互方式使用的便利功能.对于编程,最好使用像 [ 这样的标准子集函数,特别是参数子集的非标准评估可能会产生意想不到的后果.

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

有人可以帮助澄清作者的意思吗?

Could someone help clarify what the authors mean?

首先,交互式使用"是什么意思?我知道什么是交互式会话,而不是在 BATCH 模式下运行的脚本,但我看不出它应该有什么不同.

First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.

那么,您能否解释一下参数子集的非标准评估"?为什么它很危险,也许可以举个例子?

Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?

推荐答案

@James 的评论很好地回答了这个问题,指出 Hadley Wickham 对 subset 的危险性做了很好的解释(以及类似的功能) [这里].去读吧!

This question was answered in well in the comments by @James, pointing to an excellent explanation by Hadley Wickham of the dangers of subset (and functions like it) [here]. Go read it!

读起来有点长,所以在这里记录一下 Hadley 使用的最直接解决什么可能出错?"的问题的例子可能会有所帮助:

It's a somewhat long read, so it may be helpful to record here the example that Hadley uses that most directly addresses the question of "what can go wrong?":

Hadley 建议使用以下示例:假设我们要使用以下函数对数据框进行子集化然后重新排序:

Hadley suggests the following example: suppose we want to subset and then reorder a data frame using the following functions:

scramble <- function(x) x[sample(nrow(x)), ]

subscramble <- function(x, condition) {
  scramble(subset(x, condition))
}

subscramble(mtcars, cyl == 4)

这会返回错误:

eval(expr, envir, enclos) 中的错误:找不到对象cyl"

Error in eval(expr, envir, enclos) : object 'cyl' not found

因为 R 不再知道"在哪里可以找到名为 'cyl' 的对象.他还指出了如果在全球环境中偶然有一个名为cyl"的物体,可能会发生真正奇怪的事情:

because R no longer "knows" where to find the object called 'cyl'. He also points out the truly bizarre stuff that can happen if by chance there is an object called 'cyl' in the global environment:

cyl <- 4
subscramble(mtcars, cyl == 4)

cyl <- sample(10, 100, rep = T)
subscramble(mtcars, cyl == 4)

(运行它们自己看看,这太疯狂了.)

(Run them and see for yourself, it's pretty crazy.)

这篇关于为什么`[`比`subset`好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆