将ggplot2映射变量与y进行比较并使用stat =“bin”时出错 [英] Error with ggplot2 mapping variable to y and using stat="bin"

查看:226
本文介绍了将ggplot2映射变量与y进行比较并使用stat =“bin”时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用ggplot2制作直方图:

  geom_histogram(aes(x = ...),y = ..count ../ sum(.. ncount ..))

我得到错误:

 将变量映射到y并使用stat =bin。 
当stat =bin时,它会尝试将y值设置为每个组中的个案数。
这可能会导致意外的行为,并且不会在未来版本的ggplot2中被允许。
如果您希望y表示个案的计数,则使用stat =bin并且不要将变量映射到y。
如果您希望y表示数据中的值,请使用stat =identity。
请参阅?geom_bar以获取示例。 (已弃用;最后一次在版本0.9.2中使用)

我对这个错误感到困惑,因为我没有将变量映射到 y ,只是直方图 x ,而且会就像直方图条的高度来表示数据的归一化部分(例如,所有的条高一起总和为数据的100%)。

编辑:如果我想使密度图 geom_density 而不是 geom_histogram ,我可以使用 .. ncount ../ sum(.. ncount ..) .. scaled .. ?我不清楚什么 .. scaled .. does。

解决方案

这里的混乱是一个长期存在的问题(通过详细的警告信息证明)所有以 stat_bin 开始。



<但用户通常不会意识到他们的混淆是围绕 stat_bin 展开的,因为他们在使用 geom_bar 或 geom_histogram 。请注意每个文档:它们都使用 stat =bin(在当前的 ggplot2 版本中,此统计信息已被拆分为 stat_bin 用于连续数据, stat_count 用于离散数据)。



但让我们回来吧。 geom _ * 将数据的实际渲染控制为某种几何形式。 stat _ * 简单地转换您的数据。这种区别在实践中有点混乱,因为添加一层 stat_bin 默认情况下会调用 geom_bar 等等在学习时,它似乎与 geom_bar 没有区别。



无论如何,考虑bar像geom's:直方图和条形图。这两者显然都会涉及到某个地方的某些数据分类。但我们的数据可能是预先汇总或不是。例如,我们可能需要一个条形图:

  x 
a
a
a
b
b
b

或等同于

  xy 
a 3
b 3

第一个尚未装箱。第二个是预先装箱的。 geom_bar geom_histogram 的默认行为是假设您没有预先设置 装箱你的数据。因此,他们会尝试在上调用 stat_bin (对于柱状图,现在<条形图code> stat_count 用于条形图) > x values。



正如警告所说,它会尝试映射 y 为你的结果计数。如果你 试图将 y 自己映射到其他变量,你最终会在这里有龙的领土。将 y 映射到 stat_bin .. count .. 等)应该没问题,不应该抛出这个警告(它不适用于我使用上面的@ mnel的例子)。



这里是 geom_bar ,如果你已经预先计算了条的高度,记得使用 stat =identity,或者更好地使用默认情况下使用 stat =identity的新的 geom_col 。对于 geom_histogram ,您很可能不会预先计算垃圾箱,因此在大多数情况下,您只需记住不要映射 y 超出从 stat_bin 返回的内容。

geom_dotplot 使用它自己的分箱统计, stat_bindot ,我相信这个讨论也适用于这里。这种事情通常不是2d装箱情况( geom_bin2d geom_hex )的问题,因为它们不存在在类似的 z 变量中对1d情况下的分箱的 y 变量的可用灵活性没有那么大。如果未来的更新开始允许更多的二维binning案例的操作,我想这可能成为你必须留意的地方。


I am using ggplot2 to make a histogram:

geom_histogram(aes(x=...), y="..ncount../sum(..ncount..)")

and I get the error:

Mapping a variable to y and also using stat="bin".
  With stat="bin", it will attempt to set the y value to the count of cases in each group.
  This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
  If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
  If you want y to represent values in the data, use stat="identity".
  See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)

What causes this in general? I am confused about the error because I'm not mapping a variable to y, just histogram-ing x and would like the height of the histogram bar to represent a normalized fraction of the data (such that all the bar heights together sum to 100% of the data.)

edit: if I want to make a density plot geom_density instead of geom_histogram, do I use ..ncount../sum(..ncount..) or ..scaled..? I'm unclear about what ..scaled.. does.

解决方案

The confusion here is a long standing one (as evidenced by the verbose warning message) that all starts with stat_bin.

But users don't typically realize that their confusion revolves around stat_bin, since they typically encounter problems while using either geom_bar or geom_histogram. Note the documentation for each: they both use stat = "bin" (in current ggplot2 versions this stat has been split into stat_bin for continuous data and stat_count for discrete data) by default.

But let's back up. geom_*'s control the actual rendering of data into some sort of geometric form. stat_*'s simply transform your data. The distinction is a bit confusing in practice, because adding a layer of stat_bin will, by default, invoke geom_bar and so it can seem indistinguishable from geom_bar when you're learning.

In any case, consider the "bar"-like geom's: histograms and bar charts. Both are clearly going to involve some binning of data somewhere along the line. But our data could either be pre-summarised or not. For instance, we might want a bar plot from:

x
a
a
a
b
b
b

or equivalently from

x  y
a  3
b  3

The first hasn't been binned yet. The second is pre-binned. The default behavior for both geom_bar and geom_histogram is to assume that you have not pre-binned your data. So they will attempt to call stat_bin (for histograms, now stat_count for bar charts) on your x values.

As the warning says, it will then try to map y for you to the resulting counts. If you also attempt to map y yourself to some other variable you end up in Here There Be Dragons territory. Mapping y to functions of the variables returned by stat_bin (..count.., etc.) should be ok and should not throw that warning (it doesn't for me using @mnel's example above).

The take-away here is that for geom_bar if you've pre-computed the heights of the bars, always remember to use stat = "identity", or better yet use the newer geom_col which uses stat = "identity" by default. For geom_histogram it's very unlikely that you will have pre-computed the bins, so in most cases you just need to remember not to map y to anything beyond what's returned from stat_bin.

geom_dotplot uses it's own binning stat, stat_bindot, and this discussion applies here as well, I believe. This sort of thing generally hasn't been an issue with the 2d binning cases (geom_bin2d and geom_hex) since there hasn't been as much flexibility available in the analogous z variable to the binned y variable in the 1d case. If future updates start allowing more fancy manipulations of the 2d binning cases this could I suppose become something you have to watch out for there.

这篇关于将ggplot2映射变量与y进行比较并使用stat =“bin”时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆