将ggplot2映射变量与y进行比较并使用stat =“bin”时出错 [英] Error with ggplot2 mapping variable to y and using stat="bin"
问题描述
我使用ggplot2制作直方图:
geom_histogram(aes(x = ...),y = ..count ../ sum(.. ncount ..))
我得到错误:
将变量映射到y并使用stat =bin。
当stat =bin时,它会尝试将y值设置为每个组中的个案数。
这可能会导致意外的行为,并且不会在未来版本的ggplot2中被允许。
如果您希望y表示个案的计数,则使用stat =bin并且不要将变量映射到y。
如果您希望y表示数据中的值,请使用stat =identity。
请参阅?geom_bar以获取示例。 (已弃用;最后一次在版本0.9.2中使用)
我对这个错误感到困惑,因为我没有将变量映射到 y
,只是直方图 x
,而且会就像直方图条的高度来表示数据的归一化部分(例如,所有的条高一起总和为数据的100%)。
编辑:如果我想使密度图 geom_density
而不是 geom_histogram
,我可以使用 .. ncount ../ sum(.. ncount ..)
或 .. scaled ..
?我不清楚什么 .. scaled ..
does。
这里的混乱是一个长期存在的问题(通过详细的警告信息证明)所有以 stat_bin
开始。
<但用户通常不会意识到他们的混淆是围绕
stat_bin
展开的,因为他们在使用 geom_bar
或 geom_histogram
。请注意每个文档:它们都使用 stat =bin
(在当前的 ggplot2 版本中,此统计信息已被拆分为 stat_bin
用于连续数据,但让我们回来吧。 geom _ *
将数据的实际渲染控制为某种几何形式。 stat _ *
简单地转换您的数据。这种区别在实践中有点混乱,因为添加一层 stat_bin
默认情况下会调用 geom_bar
等等在学习时,它似乎与 geom_bar
没有区别。
无论如何,考虑bar像geom's:直方图和条形图。这两者显然都会涉及到某个地方的某些数据分类。但我们的数据可能是预先汇总或不是。例如,我们可能需要一个条形图:
x
a
a
a
b
b
b
或等同于
xy
a 3
b 3
第一个尚未装箱。第二个是预先装箱的。 geom_bar
和 geom_histogram
的默认行为是假设您没有预先设置 装箱你的数据。因此,他们会尝试在上调用
用于条形图) > x values。 stat_bin
(对于柱状图,现在<条形图code> stat_count
正如警告所说,它会尝试映射 y
为你的结果计数。如果你 试图将 y
自己映射到其他变量,你最终会在这里有龙的领土。将 y
映射到 stat_bin
( .. count ..
等)应该没问题,不应该抛出这个警告(它不适用于我使用上面的@ mnel的例子)。
这里是 geom_bar
,如果你已经预先计算了条的高度,记得使用 stat =identity
,或者更好地使用默认情况下使用 stat =identity
的新的 geom_col
。对于 geom_histogram
,您很可能不会预先计算垃圾箱,因此在大多数情况下,您只需记住不要映射 y
超出从 stat_bin
返回的内容。
geom_dotplot
使用它自己的分箱统计, stat_bindot
,我相信这个讨论也适用于这里。这种事情通常不是2d装箱情况( geom_bin2d
和 geom_hex
)的问题,因为它们不存在在类似的 z
变量中对1d情况下的分箱的 y
变量的可用灵活性没有那么大。如果未来的更新开始允许更多的二维binning案例的操作,我想这可能成为你必须留意的地方。
I am using ggplot2 to make a histogram:
geom_histogram(aes(x=...), y="..ncount../sum(..ncount..)")
and I get the error:
Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
What causes this in general? I am confused about the error because I'm not mapping a variable to y
, just histogram-ing x
and would like the height of the histogram bar to represent a normalized fraction of the data (such that all the bar heights together sum to 100% of the data.)
edit: if I want to make a density plot geom_density
instead of geom_histogram
, do I use ..ncount../sum(..ncount..)
or ..scaled..
? I'm unclear about what ..scaled..
does.
The confusion here is a long standing one (as evidenced by the verbose warning message) that all starts with stat_bin
.
But users don't typically realize that their confusion revolves around stat_bin
, since they typically encounter problems while using either geom_bar
or geom_histogram
. Note the documentation for each: they both use stat = "bin"
(in current ggplot2 versions this stat has been split into stat_bin
for continuous data and stat_count
for discrete data) by default.
But let's back up. geom_*
's control the actual rendering of data into some sort of geometric form. stat_*
's simply transform your data. The distinction is a bit confusing in practice, because adding a layer of stat_bin
will, by default, invoke geom_bar
and so it can seem indistinguishable from geom_bar
when you're learning.
In any case, consider the "bar"-like geom's: histograms and bar charts. Both are clearly going to involve some binning of data somewhere along the line. But our data could either be pre-summarised or not. For instance, we might want a bar plot from:
x
a
a
a
b
b
b
or equivalently from
x y
a 3
b 3
The first hasn't been binned yet. The second is pre-binned. The default behavior for both geom_bar
and geom_histogram
is to assume that you have not pre-binned your data. So they will attempt to call stat_bin
(for histograms, now stat_count
for bar charts) on your x
values.
As the warning says, it will then try to map y
for you to the resulting counts. If you also attempt to map y
yourself to some other variable you end up in Here There Be Dragons territory. Mapping y
to functions of the variables returned by stat_bin
(..count..
, etc.) should be ok and should not throw that warning (it doesn't for me using @mnel's example above).
The take-away here is that for geom_bar
if you've pre-computed the heights of the bars, always remember to use stat = "identity"
, or better yet use the newer geom_col
which uses stat = "identity"
by default. For geom_histogram
it's very unlikely that you will have pre-computed the bins, so in most cases you just need to remember not to map y
to anything beyond what's returned from stat_bin
.
geom_dotplot
uses it's own binning stat, stat_bindot
, and this discussion applies here as well, I believe. This sort of thing generally hasn't been an issue with the 2d binning cases (geom_bin2d
and geom_hex
) since there hasn't been as much flexibility available in the analogous z
variable to the binned y
variable in the 1d case. If future updates start allowing more fancy manipulations of the 2d binning cases this could I suppose become something you have to watch out for there.
这篇关于将ggplot2映射变量与y进行比较并使用stat =“bin”时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!