为什么stat ="identity"?在ggplot的geom_bar中有必要吗? [英] Why is stat = "identity" necessary in geom_bar in ggplot?
问题描述
From this question we see a simple geom_line
in the answer.
library(dplyr)
BactData %>% filter(year(Date) == 2017) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_line()
如果将geom_line
更改为geom_bar
,我们可能希望看到条形图,但是
If we change geom_line
to geom_bar
we may expect to see a bar plot, but instead
错误:stat_count()不能与y美学一起使用.
Error: stat_count() must not be used with a y aesthetic.
但是,如果我们添加stat = "identity"
,它会起作用
But it works if we add stat = "identity"
, like so
library(dplyr)
BactData %>% filter(year(Date) == 2017) %>%
ggplot(aes(Date, Svartediket_CB )) + geom_bar(stat = "identity")
为什么geom_bar
在没有stat = "identity"
的情况下不起作用-即stat = "identity"
的目的是什么?
Why doesn't geom_bar
work without stat = "identity"
- i.e. what is the purpose of stat = "identity"
?
推荐答案
有两个紧密相关的层:geom_bar()
和geom_col()
.关键区别在于它们默认情况下如何汇总数据.
There are two layers that are closely related: geom_bar()
and geom_col()
. The key difference is how they aggregate the data by default.
对于geom_bar()
,默认行为是对每个 x 值的行进行计数.它不期望 y 值,因为它会自行计算-实际上,如果您给它一个警告,它会标记一个警告,因为它认为您很困惑.指定如何执行聚合作为geom_bar()
的参数,默认值是stat = "count"
.
For geom_bar()
, the default behavior is to count the rows for each x value. It doesn't expect a y-value, since it's going to count that up itself -- in fact, it will flag a warning if you give it one, since it thinks you're confused. How aggregation is to be performed is specified as an argument to geom_bar()
, which is stat = "count"
for the default value.
如果您在geom_bar()
中明确说出stat = "identity"
,则是在告诉ggplot2
跳过汇总,并提供 y 值.这反映了下面geom_col()
的自然行为.
If you explicitly say stat = "identity"
in geom_bar()
, you're telling ggplot2
to skip the aggregation and that you'll provide the y values. This mirrors the natural behavior of geom_col()
below.
对于geom_col()
,默认情况下它不会尝试聚合数据.从文档中,"geom_col()
使用stat_identity()
:它将数据保持原样".因此,它希望您已经计算出 y 值并直接使用它们.而且geom_col()
没有论据来改变这种行为-它总是要绘制您提供的 y 值,而您需要提供它们.
In the case of geom_col()
, it won't try to aggregate the data by default. From the docs, "geom_col()
uses stat_identity()
: it leaves the data as is". So, it expects you to already have the y values calculated and to use them directly. And geom_col()
doesn't have an argument to change that behavior - it's always going to plot your y values that you provide, and you need to provide them.
如果您具有 y 值,则可以使用这两种语法,但是我发现geom_col()
更直接.
If you have y values, you could use either syntax, but I find geom_col()
more direct.
这篇关于为什么stat ="identity"?在ggplot的geom_bar中有必要吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!