为什么stat ="identity"?在ggplot的geom_bar中有必要吗? [英] Why is stat = "identity" necessary in geom_bar in ggplot?

查看:1065
本文介绍了为什么stat ="identity"?在ggplot的geom_bar中有必要吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自

From this question we see a simple geom_line in the answer.

library(dplyr)
BactData %>% filter(year(Date) == 2017) %>% 
  ggplot(aes(Date, Svartediket_CB )) + geom_line()

如果将geom_line更改为geom_bar,我们可能希望看到条形图,但是

If we change geom_line to geom_bar we may expect to see a bar plot, but instead

错误:stat_count()不能与y美学一起使用.

Error: stat_count() must not be used with a y aesthetic.

但是,如果我们添加stat = "identity",它会起作用

But it works if we add stat = "identity", like so

library(dplyr)
BactData %>% filter(year(Date) == 2017) %>% 
  ggplot(aes(Date, Svartediket_CB )) + geom_bar(stat = "identity")

为什么geom_bar在没有stat = "identity"的情况下不起作用-即stat = "identity"的目的是什么?

Why doesn't geom_bar work without stat = "identity" - i.e. what is the purpose of stat = "identity"?

推荐答案

有两个紧密相关的层:geom_bar()geom_col().关键区别在于它们默认情况下如何汇总数据.

There are two layers that are closely related: geom_bar() and geom_col(). The key difference is how they aggregate the data by default.

对于geom_bar(),默认行为是对每个 x 值的行进行计数.它不期望 y 值,因为它会自行计算-实际上,如果您给它一个警告,它会标记一个警告,因为它认为您很困惑.指定如何执行聚合作为geom_bar()的参数,默认值是stat = "count".

For geom_bar(), the default behavior is to count the rows for each x value. It doesn't expect a y-value, since it's going to count that up itself -- in fact, it will flag a warning if you give it one, since it thinks you're confused. How aggregation is to be performed is specified as an argument to geom_bar(), which is stat = "count" for the default value.

如果您在geom_bar()中明确说出stat = "identity",则是在告诉ggplot2跳过汇总,并提供 y 值.这反映了下面geom_col()的自然行为.

If you explicitly say stat = "identity" in geom_bar(), you're telling ggplot2 to skip the aggregation and that you'll provide the y values. This mirrors the natural behavior of geom_col() below.

对于geom_col(),默认情况下它不会尝试聚合数据.从文档中,"geom_col()使用stat_identity():它将数据保持原样".因此,它希望您已经计算出 y 值并直接使用它们.而且geom_col()没有论据来改变这种行为-它总是要绘制您提供的 y 值,而您需要提供它们.

In the case of geom_col(), it won't try to aggregate the data by default. From the docs, "geom_col() uses stat_identity(): it leaves the data as is". So, it expects you to already have the y values calculated and to use them directly. And geom_col() doesn't have an argument to change that behavior - it's always going to plot your y values that you provide, and you need to provide them.

如果您具有 y 值,则可以使用这两种语法,但是我发现geom_col()更直接.

If you have y values, you could use either syntax, but I find geom_col() more direct.

这篇关于为什么stat ="identity"?在ggplot的geom_bar中有必要吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆