将与美元符号表示法的变量与facet_grid()或facet_wrap()组合使用时将问题传递给aes() [英] Issue when passing variable with dollar sign notation to aes() in combination with facet_grid() or facet_wrap()

查看:197
本文介绍了将与美元符号表示法的变量与facet_grid()或facet_wrap()组合使用时将问题传递给aes()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在正在ggplot2中进行一些项目分析,偶然我偶然发现了一些我无法解释的(对我而言)奇怪的行为。当我写 aes(x = cyl,...)时,如果我使用 aes(x = mtcars $ cyl,...)。当我删除 facet_grid(am〜。)时,两张图都是一样的。下面的代码是在我的项目中生成相同行为的代码之后建模的:

  library(dplyr)
library (ggplot2)

data = mtcars

test.data = data%>%
select(-hp)


ggplot(test.data,aes(x = test.data $ cyl,y = mpg))+
geom_point()+
facet_grid(am〜。)+
labs(title =图1-美元符号表示法)

ggplot(test.data,aes(x = cyl,y = mpg))+
geom_point()+
facet_grid( am〜。)+
labs(title =graph 2 - no dollar sign notation)

以下是图1的图片:



以下是图2的图片:



I foun d我可以使用 aes_string 而不是 aes 并将变量名称作为字符串传递来解决此问题,但我会想知道为什么ggplot是这样做的。使用 facet_wrap 进行类似的尝试也会出现这个问题。



Thx对于任何帮助提前了很多!如果我不明白这一点,我会感到非常不舒服...

/ p>

从不使用 [ $ 里面 aes()






考虑这个例子其中facetting变量 f 故意以不明显的顺序相对于 x

  d < -  data.frame(x = 1:10,f = rev(字母[gl(2,5)]))

现在对照这两个地块发生的情况,



<$ p $ (x,y = 0,label = x,y = 0,...,ggplot(d)+
facet_grid(。〜f,labeller = label_both)+
geom_text)颜色= f))+
ggtitle(好映射)

p2 < - ggplot(d)+
facet_grid(。〜f,labeller = label_both)+
geom_text(aes(d $ x,y = 0,label = x,color = f))+
ggtitle($ corruption)



<通过查看由ggplot2为每个面板创建的data.frame,我们可以更好地了解发生了什么。

  ggplot_build(p1)[[data]] [[1]] [,c(x,PANEL)] 

x PANEL
1 6 1
2 7 1
3 8 1
4 9 1
5 10 1
6 1 2
7 2 2
8 3 2
9 4 2
10 5 2

ggplot_build(p2)[[data]] [[1]] [,c(x,PANEL)]

x PANEL
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 2
7 7 2
8 8 2
9 9 2
10 10 2

第二个图有错误的映射,因为当ggplot为每个面板创建一个data.frame时,它会以错误顺序选取x值。

发生这种情况是因为使用 $ 会中断要映射的各个变量之间的链接(ggplot必须假设它是这是一个独立的变量,尽管它知道它可能来自一个任意的,不连通的源)。由于此示例中的data.frame不是按照因子 f 排序的,因此每个面板内部使用的子集data.frames假定顺序不对。




I am doing some analysis in ggplot2 at the moment for a project and by chance I stumbled across some (for me) weird behavior that I cannot explain. When I write aes(x = cyl, ...) the plot looks different to what it does if I pass the same variable using aes(x = mtcars$cyl, ...). When I remove facet_grid(am ~ .) both graphs are the same again. The code below is modeled after the code in my project that generates the same behavior:

library(dplyr)
library(ggplot2)

data = mtcars

test.data = data %>%
  select(-hp)


ggplot(test.data, aes(x = test.data$cyl, y = mpg)) +
  geom_point() + 
  facet_grid(am ~ .) +
  labs(title="graph 1 - dollar sign notation")

ggplot(test.data, aes(x = cyl, y = mpg)) +
  geom_point()+ 
  facet_grid(am ~ .) +
  labs(title="graph 2 - no dollar sign notation")

Here is the picture of graph 1:

Here is the picture of graph 2:

I found that I can work around this problem using aes_string instead of aes and passing the variable names as strings, but I would like to understand why ggplot is behaving that way. The problem also occurs in similar attempts with facet_wrap.

Thx a lot for any help in advance! I feel very uncomfortable if I do not understand that properly...

解决方案

tl;dr

Never use [ or $ inside aes().


Consider this illustrative example where the facetting variable f is purposely in a non-obvious order with respect to x

d <- data.frame(x=1:10, f=rev(letters[gl(2,5)]))

Now contrast what happens with these two plots,

p1 <- ggplot(d) +
  facet_grid(.~f, labeller = label_both) +
  geom_text(aes(x, y=0, label=x, colour=f)) +
  ggtitle("good mapping") 

p2 <- ggplot(d) +
  facet_grid(.~f, labeller = label_both) +
  geom_text(aes(d$x, y=0, label=x, colour=f)) +
  ggtitle("$ corruption") 

We can get a better idea of what's happening by looking at the data.frame created internally by ggplot2 for each panel,

 ggplot_build(p1)[["data"]][[1]][,c("x","PANEL")]

    x PANEL
1   6     1
2   7     1
3   8     1
4   9     1
5  10     1
6   1     2
7   2     2
8   3     2
9   4     2
10  5     2

 ggplot_build(p2)[["data"]][[1]][,c("x", "PANEL")]

    x PANEL
1   1     1
2   2     1
3   3     1
4   4     1
5   5     1
6   6     2
7   7     2
8   8     2
9   9     2
10 10     2

The second plot has the wrong mapping, because when ggplot creates a data.frame for each panel, it picks x values in the "wrong" order.

This occurs because the use of $ breaks the link between the various variables to be mapped (ggplot must assume it's an independent variable, which for all it knows could come from an arbitrary, disconnected source). Since the data.frame in this example is not ordered according to the factor f, the subset data.frames used internally for each panel assume the wrong order.

这篇关于将与美元符号表示法的变量与facet_grid()或facet_wrap()组合使用时将问题传递给aes()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆