在函数内绘制:subset(df,id _ == ...)给出错误的绘图,df [df $ id _ == ...,]是正确的 [英] Plotting inside function: subset(df,id_==...) gives wrong plot, df[df$id_==...,] is right

查看:123
本文介绍了在函数内绘制:subset(df,id _ == ...)给出错误的绘图,df [df $ id _ == ...,]是正确的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个df,我想单独绘制多个y系列,所以我编写了一个fn来选择一个特定的系列,分配给一个局部变量 dat ,然后绘制它。但是,在fn中调用ggplot / geom_step时,并不像一个系列那样正确处理它。我不明白这可能是一个范围问题,因为如果 dat 不可见,那么ggplot肯定会失败?

I have a df with multiple y-series which I want to plot individually, so I wrote a fn that selects one particular series, assigns to a local variable dat, then plots it. However ggplot/geom_step when called inside the fn doesn't treat it properly like a single series. I don't see how this can be a scoping issue, since if dat wasn't visible, surely ggplot would fail?

您可以在从顶层环境执行代码时验证代码是否正确,但不在函数内部。这不是一个重复的问题。我理解这个问题(这是ggplot反复出现的问题),但我读过所有其他答案;这不是重复的,他们也没有给出解决方案。

You can verify the code is correct when executed from the toplevel environment, but not inside the function. This is not a duplicate question. I understand the problem (this is a recurring issue with ggplot), but I've read all the other answers; this is not a duplicate and they do not give the solution.

set.seed(1234)
require(ggplot2)
require(scales)

N = 10
df <- data.frame(x = 1:N,
                 id_ = c(rep(20,N), rep(25,N), rep(33,N)),
                 y = c(runif(N, 1.2e6, 2.9e6), runif(N, 5.8e5, 8.9e5) ,runif(N, 2.4e5, 3.3e5)),
                 row.names=NULL)

plot_series <- function(id_, envir=environment()) {
  dat <- subset(df,id_==id_)
  p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
  # Unsuccessfully trying the approach from http://stackoverflow.com/questions/22287498/scoping-of-variables-in-aes-inside-a-function-in-ggplot
  p$plot_env <- envir
  plot(p)
  # Displays wrongly whether we do the plot here inside fn, or return the object to parent environment 
  return(p)
}

 # BAD: doesn't plot geom_step!
plot_series(20)

# GOOD! but what's causing the difference?
ggplot(data=subset(df,id_==20), mapping=aes(x,y), color='red') + geom_step()

#plot_series(25)
#plot_series(33)


推荐答案

fine:

This works fine:

plot_series <- function(id_) {
    dat <- df[df$id_ == id_,]
    p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
    return(p)
}

print(plot_series(20))

如果您简单地遍历原始函数使用 debug ,你很快就会发现子集行实际上并不是数据框的子集:它返回所有行!

If you simply step through the original function using debug, you'll quickly see that the subset line did not actually subset the data frame at all: it returned all rows!

为什么?由于 subset 使用非标准评估,因此您对列名称和函数参数使用相同的名称。正如jlhoward在上面演示的那样,它可能会有效(但可能不是明智的),因为这两个名称不同。

Why? Because subset uses non-standard evaluation and you used the same name for both the column name and the function argument. As jlhoward demonstrates above, it would have worked (but probably not been advisable) to have simply used different names for the two.

原因是子集首先与数据帧一起计算。所以它在逻辑表达式中看到的是在该数据框中始终为真的 id_ == id _

The reason is that subset evaluates with the data frame first. So all it sees in the logical expression is the always true id_ == id_ within that data frame.

一想想它的一种方式就是扮演愚蠢的角色​​(就像一台电脑一样),并且在出现条件时询问自己,你怎么知道每个符号所指的是什么。它是不明确的,并且 subset 作出一致的选择:使用数据框中的内容。

One way to think about it is to play dumb (like a computer) and ask yourself when presented with the condition id_ == id_ how do you know what exactly each symbol refers to. It's ambiguous, and subset makes a consistent choice: use what's in the data frame.

这篇关于在函数内绘制:subset(df,id _ == ...)给出错误的绘图,df [df $ id _ == ...,]是正确的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆