R-ggplot2-ggplot(data,aes(x = variable ...))和ggplot(data,aes(x = data $ variable ...))之间的差异 [英] R - ggplot2 - difference between ggplot(data, aes(x=variable...)) and ggplot(data, aes(x=data$variable...))

查看:415
本文介绍了R-ggplot2-ggplot(data,aes(x = variable ...))和ggplot(data,aes(x = data $ variable ...))之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在ggplot2中遇到了一种现象,如果有人可以向我提供解释,我将不胜感激.

I have currently encountered a phenomenon in ggplot2, and I would be grateful if someone could provide me with an explanation.

我需要在直方图中绘制连续变量,并且需要在图中表示两个类别变量.以下数据框是一个很好的例子.

I needed to plot a continuous variable on a histogram, and I needed to represent two categorical variables on the plot. The following dataframe is a good example.

library(ggplot2)


species <- rep(c('cat', 'dog'), 30)
numb <- rep(c(1,2,3,7,8,10), 10)
groups <- rep(c('A', 'A', 'B', 'B'), 15)

data <- data.frame(species=species, numb=numb, groups=groups)

下面的代码代表连续变量的分类.

Let the following code represent the categorisation of a continuous variable.

data$factnumb <- as.factor(data$numb)

如果我想绘制此数据集,则以下两个代码是完全可以互换的:

If I would like to plot this dataset the following two codes are completely interchangable:

请注意 fill = 语句后的区别.

p <- ggplot(data, aes(x=factnumb, fill=species)) +
        facet_grid(groups ~ .) +
        geom_bar(aes(y=(..count..)/sum(..count..))) +
        scale_y_continuous(labels = scales::percent)

图(p):

q <- ggplot(data, aes(x=factnumb, fill=data$species)) +
        facet_grid(groups ~ .) +
        geom_bar(aes(y=(..count..)/sum(..count..))) +
        scale_y_continuous(labels = scales::percent)

图(q):

但是,当使用现实生活中的连续变量时,并非所有类别都包含观测值,并且我仍然需要在x轴上表示空类别,以便获得样本分布的近似值.为了说明这一点,我使用了以下代码:

However, when working with real-life continuous variables not all categories will contain observations, and I still need to represent the empty categories on the x-axis in order to get the approximation of the sample distribution. To demostrate this, I used the following code:

data_miss  <- data[which(data$numb!= 3),]

这导致分类变量的级别与数据集中的观测值之间存在差异:

This results in a disparity between the levels of the categorial variable and the observations in the dataset:

> unique(data_miss$factnumb)
[1] 1  2  7  8  10
Levels: 1 2 3 7 8 10

并绘制 data_miss 数据集,该数据集仍包括 factnumb 变量的所有级别.

And plotted the data_miss dataset, still including all of the levels of the factnumb variable.

pm <- ggplot(data_miss, aes(x=factnumb, fill=species)) +
        facet_grid(groups ~ .) +
        geom_bar(aes(y=(..count..)/sum(..count..))) +
        scale_fill_discrete(drop=FALSE) +
        scale_x_discrete(drop=FALSE)+
        scale_y_continuous(labels = scales::percent)

图(pm):

qm <- ggplot(data_miss, aes(x=factnumb, fill=data_miss$species)) +
        facet_grid(groups ~ .) +
        geom_bar(aes(y=(..count..)/sum(..count..))) +
        scale_x_discrete(drop=FALSE)+
        scale_fill_discrete(drop=FALSE) +
        scale_y_continuous(labels = scales::percent)

图(qm):

在这种情况下,当使用 fill = data_miss $ species 时,地块的填充会发生变化(更糟糕的是).

In this case, when using fill=data_miss$species the filling of the plot changes (and for the worse).

如果有人能帮我解决这个问题,我将非常高兴.

I would be really happy if someone could clear this one up for me.

是不是很幸运,在图1和图2的情况下填充是相同的,还是偶然发现了ggplot2的精细机制中的一些细微错误?

Is it just "luck", that in case of plot 1 and 2 the filling is identical, or I have stumbled upon some delicate mistake in the fine machinery of ggplot2?

提前谢谢!

亲切的问候,

贝纳黛特

推荐答案

在内部使用aes(data$variable)永远都不是好事,也不建议使用,也永远不应该使用.有时它仍然可以工作,但是aes(variable) 总是可以工作,因此您应该始终使用aes(variable).

Using aes(data$variable) inside is never good, never recommended, and should never be used. Sometimes it still works, but aes(variable) always works, so you should always use aes(variable).

ggplot使用非标准评估. 标准评估 R函数只能查看全局环境中的对象.如果我有名为mydata且列名称为col1的数据,并且我执行了mean(col1),则会收到错误消息:

ggplot uses nonstandard evaluation. A standard evaluating R function can only see objects in the global environment. If I have data named mydata with a column name col1, and I do mean(col1), I get an error:

mydata = data.frame(col1 = 1:3)
mean(col1)
# Error in mean(col1) : object 'col1' not found

发生此错误是因为col1不在全局环境中.只是mydata数据框的列名.

This error happens because col1 isn't in the global environment. It's just a column name of the mydata data frame.

aes函数在后台进行了额外的工作,除了检查全局环境外,还知道查看图层data的列.

The aes function does extra work behind the scenes, and knows to look at the columns of the layer's data, in addition to checking the global environment.

ggplot(mydata, aes(x = col1)) + geom_bar()
# no error

但是,您没有仅在aes中使用一列.为了提供灵活性,您可以执行列的功能,甚至可以就地定义一些其他矢量(如果它的长度正确):

You don't have to use just a column inside aes though. To give flexibility, you can do a function of a column, or even some other vector that you happen to define on the spot (if it has the right length):

# these work fine too
ggplot(mydata, aes(x = log(col1))) + geom_bar()
ggplot(mydata, aes(x = c(1, 8, 11)) + geom_bar()

那么col1mydata$col1有什么区别?好吧,col1是列的名称,而mydata$col1是实际值. ggplot将在数据中查找名为col1的列,并使用该列. mydata$col1只是一个向量,它是完整的列.区别很重要,因为ggplot经常进行数据操作.每当有构面或聚合函数时,ggplot就会将您的数据分成多个部分并进行处理.为了有效地做到这一点,它需要知道标识数据和列名.当给它mydata$col1时,并不是在给它一个列名,而是在给它一个值的向量-该列中发生的任何事情都无法正常工作.

So what's the difference between col1 and mydata$col1? Well, col1 is a name of a column, and mydata$col1 is the actual values. ggplot will look for columns in your data named col1, and use that. mydata$col1 is just a vector, it's the full column. The difference matters because ggplot often does data manipulation. Whenever there are facets or aggregate functions, ggplot is splitting your data up into pieces and doing stuff. To do this effectively, it needs to know identify the data and column names. When you give it mydata$col1, you're not giving it a column name, you're just giving it a vector of values - whatever happens to be in that column, and things don't work.

因此,只需在aes()中使用不带引号的列名而不使用data$,一切将按预期工作.

So, just use unquoted column names in aes() without data$ and everything will work as expected.

这篇关于R-ggplot2-ggplot(data,aes(x = variable ...))和ggplot(data,aes(x = data $ variable ...))之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆