ggplot2 geom_violin,方差为0 [英] ggplot2 geom_violin with 0 variance

查看:309
本文介绍了ggplot2 geom_violin,方差为0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始非常喜欢小提琴剧情,因为他们让我更好地感受到,当你有趣的发行时,盒子会出现。我喜欢自动化很多东西,因此遇到了一个问题:
当一个变量有0个方差时,boxplot只会给你一条线。然而,Geom_violin因错误而终止。我会喜欢什么样的行为?那么,无论是放在一行还是没有,但请给我其他变量的分布。



好的,快速示例:
$ b = pre $ d $ = data.frame(x = factor(rep(1:2,each = 100)),y = c(rnorm(100),rep(0,100) ))
ggplot(dff,aes(x = x,y = y))+ geom_violin()

产生



pre $ $< - 。data.frame`中的错误(`* tmp *`,n ,value = 100L):
替换有1行,数据有0

然而,

pre $ g $ ggplot(dff,aes(x = x,y = y))+ geom_boxplot()

更新

该问题已于昨天解决:



右边的情节是全零的适当表示吗?我不这么认为。修剪生成单一行以显示数据没有变化是更好的办法。
解决方案解决方案:添加一个+ geom_boxplot()



想要 TRIM = TRUE



示例:

(pre> dff = data.frame(x = factor(rep(1:2,each = 100)),y = c(rgamma(100,1,1),rep 0,100)))
ggplot(dff,aes(x = x,y = y))+ geom_violin(trim = FALSE)

现在我有非零数据,并且标准内核密度估计不能正确处理这个问题。有了 trim = T 我可以很快看到数据是严格正面的。



我不认为目前行为是'错误的',因为它符合其他功能。然而, geom_violin 可能会在不同的上下文中使用,以探索具有异构数据类型的不同数据框架(例如,正数+偏斜或不斜线)。 $ b

解决方案

解决这个问题有三种方法,直到解决 ggplot2 问题:


  1. 作为一种快速入侵,您可以将其中一个y值设置为0.0001(而不是零),并将 geom_violin 将起作用。

  2. 如果您没有使用 ggplot2 ,请查看 vioplot >。 vioplot 在您输入一串相同的值时不会引发错误。
  3. Hmisc 包包含一个 panel.bpplot )函数,它可以用包中的 bwplot 函数创建小提琴图。请参阅?panel.bpplot 的示例部分。当您向其提供相同值的矢量时,它会生成一行。


I started to really like violin plots, since they give me a much better feel that box plots when you have funny distributions. I like to automatize a lot of stuff, and thus ran into a problem: When one variable has 0 variance, the boxplot just gives you a line at that point. Geom_violin however, terminates with an error. What behavior would I like? Well, either put in a line or nothing, but please give me the distributions for the other variables.

Ok, quick example:

dff=data.frame(x=factor(rep(1:2,each=100)),y=c(rnorm(100),rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin()

yields

Error in `$<-.data.frame`(`*tmp*`, "n", value = 100L) : 
  replacement has 1 row, data has 0

However, what works is:

ggplot(dff,aes(x=x,y=y)) + geom_boxplot()

Update:

The issue is resolved as of yesterday: https://github.com/hadley/ggplot2/issues/972

Update 2: (from question author) Wow, Hadley himself responded! geom_violin now behaves consistently with geom_density and base R density.

However, I don't think the behavior is optimal yet.

(1) The 'zero' problem

Just run it with my original example:

dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rnorm(100), rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)

Yielding this:

Is the plot on the right an appropriate representation of 'all zeroes'? I don't think so. It is better to have trimming that produces a single line to show that there is no variation in the data. Workaround solution: Add a + geom_boxplot()

(2) I may actually want TRIM=TRUE.

Example:

dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rgamma(100,1,1), rep(0,100)  ))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)

Now I have non-zero data, and standard kernel density estimates don't handle this correctly. With trim=T I can quickly see that the data is strictly positive.

I am not arguing that the current behavior is 'wrong', since it's in line with other functions. However, geom_violin may be used in different contexts, for exploring different data.frames with heterogeneous data types (positive+skewed or not, for instance).

解决方案

Three options for dealing with this until the ggplot2 issue is resolved:

  1. As a quick hack, you can set one of the y-values to 0.0001 (instead of zero) and geom_violin will work.
  2. Check out the vioplot package if you're not set on using ggplot2. vioplot doesn't throw an error when you feed it a bunch of identical values.
  3. The Hmisc package includes a panel.bpplot (box-percentile plot) function that can create violin plots with the bwplot function from the lattice package. See the Examples section of ?panel.bpplot. It produces a single line when you feed it a vector of identical values.

这篇关于ggplot2 geom_violin,方差为0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆