为什么``比'subset`好? [英] Why is `[` better than `subset`?

查看:217
本文介绍了为什么``比'subset`好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我需要过滤一个data.frame,即提取符合某些条件的行时,我更喜欢使用子集函数:

 子集(airquality,Month == 8& Temp> 90)

而不是 [函数:

  airquality [airquality $ Month == 8& airquality $ Temp> 90,] 

我的偏好有两个主要原因:


  1. 我发现代码从左到右读得更好。即使对R一无所知的人也可以知道上面的子集语句正在做什么。

  2. 因为列可以在 select中选择表达式中的变量,我可以保存几个按键。在我上面的例子中,我只需用 subset 键入 airquality 一次,但是用
    子集
    无处不在,因为它更短,读得更好,甚至向我的R编码者提倡它的美。但是昨天我的世界分崩离析了。在阅读子集文档时,我注意到这个部分:
    $ b


    警告



    这是一个交互式使用的方便功能。对于编程,最好使用标准的子集函数,比如[,特别是非标准的参数子集评估可能会有意想不到的后果。



    有人可以帮助澄清作者的意思吗?

    首先,它们是什么意思的交互使用?我知道什么是交互式会话,而不是脚本在BATCH模式下运行,但我不明白它应该有什么区别。

    然后,你能解释一下吗? 参数子集的非标准评估以及为什么它是危险的,也许提供一个例子?

    解决方案

    这个问题在@James的评论中得到了很好的回答,指出Hadley Wickham对 subset (和类似的函数) [这里] 。阅读它!

    这是一个有点长的阅读,所以在这里记录哈德利使用的例子可能是有帮助的,最直接地解决了什么可以出错?:



    Hadley建议下面的例子:假设我们想要子集,然后使用下面的函数重新排列一个数据框:
    $ b $ x $ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $' ){
    scramble(subset(x,condition))
    }

    subscramble(mtcars,cyl == 4)
    pre
    $ b

    这将返回错误:
    $ b


    eval中的错误(expr,envir ,enclos):object'cyl'not found


    因为R不再知道在哪里找到名为'cyl'的对象。他还指出,如果碰巧在全球环境中有一个叫做'cyl'的对象,那么真正奇怪的东西就可能发生:

      $ b $ cyl  - 样本(10,100,rep = T)
    subscramble(mtcars,cyl == 4)

    (运行它们并亲眼看看,这太疯狂了。)


    When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:

    subset(airquality, Month == 8 & Temp > 90)
    

    Rather than the [ function:

    airquality[airquality$Month == 8 & airquality$Temp > 90, ]
    

    There are two main reasons for my preference:

    1. I find the code reads better, from left to right. Even people who know nothing about R could tell what the subset statement above is doing.

    2. Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type airquality once with subset, but three times with [.

    So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:

    Warning

    This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

    Could someone help clarify what the authors mean?

    First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.

    Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?

    解决方案

    This question was answered in well in the comments by @James, pointing to an excellent explanation by Hadley Wickham of the dangers of subset (and functions like it) [here]. Go read it!

    It's a somewhat long read, so it may be helpful to record here the example that Hadley uses that most directly addresses the question of "what can go wrong?":

    Hadley suggests the following example: suppose we want to subset and then reorder a data frame using the following functions:

    scramble <- function(x) x[sample(nrow(x)), ]
    
    subscramble <- function(x, condition) {
      scramble(subset(x, condition))
    }
    
    subscramble(mtcars, cyl == 4)
    

    This returns the error:

    Error in eval(expr, envir, enclos) : object 'cyl' not found

    because R no longer "knows" where to find the object called 'cyl'. He also points out the truly bizarre stuff that can happen if by chance there is an object called 'cyl' in the global environment:

    cyl <- 4
    subscramble(mtcars, cyl == 4)
    
    cyl <- sample(10, 100, rep = T)
    subscramble(mtcars, cyl == 4)
    

    (Run them and see for yourself, it's pretty crazy.)

    这篇关于为什么``比'subset`好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆