如何创建具有大量连续x变量的geom_boxplot [英] How to create geom_boxplot with large amount of continuous x-variables

查看:442
本文介绍了如何创建具有大量连续x变量的geom_boxplot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含跨多个类别的x轴数字区块和连续的y轴数据。最初,我创建了一个盒子图,使x轴的因子分箱,并对融化的数据进行箱形绘制。可再现的数据:

  x < -  seq(1,10,by = 1)
y1 < - rnorm (10,mean = 3)
y2 < - rnorm(10,mean = 10)
y3 < - rorm(10,mean = 1)
y4 < - rorm = 8)
y5 < - rnorm(10,mean = 12)
df < - data.frame(x,y1,y2,y3,y4,y5)
df.m< ; - fusion(df,id =x)

我的代码创建x轴数据作为一个因素:


df.m $ x < - as.factor(df.m $ x)


我的ggplot:

  ggplot(df.m, aes(x = x,y = value))+ 
geom_boxplot(notch = FALSE,outlier.shape = NA,fill =red,alpha = 0.1)+
theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1))

b $ b :



问题是我不能使用x轴数字间距,因为x轴被归类为具有相等间距的因子。我希望能够使用像scale_x_continuous之类的东西来处理轴断点和间距,例如,间隔为2,而不是每隔1个盒形图,但是当我尝试用x轴绘制数据as.numeric ,我只是得到了所有数据的一个boxplot:





有关如何获得此连续外观盒形曲线(第一幅图像)的任何建议,同时仍然能够控制x轴的数字属性?感谢!

解决方案

这是一种使用您在Google上发布的原始数据的方式 - 实际上它更有帮助,IMO。

  ggplot(df,aes(x = CH,y = value,group = CH))+ 
geom_boxplot = FALSE,outlier.shape = NA,fill =red,alpha = 0.2)+
scale_x_log10()



所以,as @BenBolker在删除他的答案(??)之前说过,你应该将x变量( CH )作为数字,并将 group = CH在 aes(...)的调用中

尽管您的真实数据存在另一个问题。你的 CH 或多或少是以对数为间隔的,所以大约有多少个点< 1,因为它们之间的比例在1 - 10之间,等等。 ggplot 想让这些盒子的尺寸完全相同,所以对于线性x轴,盒子的宽度比线条小宽度,你根本看不到这些盒子。将x轴更改为对数坐标,或多或少地修复。


I have a data frame which contains x-axis numeric bins and continuous y-axis data across multiple categories. Initially, I created a boxplot by making the x-axis bins "factors", and doing a boxplot of the melted data. Reproducible data:

x <- seq(1,10,by=1)
y1 <- rnorm(10, mean=3)
y2 <- rnorm(10, mean=10)
y3<- rnorm(10, mean=1)
y4<- rnorm(10, mean=8)
y5<- rnorm(10, mean=12)
df <- data.frame(x,y1,y2,y3,y4,y5)
df.m <- melt(df, id="x")

My code to create the x-axis data as a factor:

df.m$x <- as.factor(df.m$x)

My ggplot:

ggplot(df.m, aes(x=x, y=value))+
 geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.1)+
 theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

The resulting plot: :

The problem is that I cannot use x-axis numeric spacing because the x-axis is categorized as a factor, which has equal spacing. I want to be able to use something like scale_x_continuous to manipulate the axis breaks and spacing to, say, an interval of 2, rather than a boxplot every 1, but when I try to plot the data with the x-axis "as.numeric", I just get one boxplot of all of the data:

Any suggestions for a way to get this continuous-looking boxplot curve (the first image) while still being able to control the numeric properties of the x-axis? Thanks!

解决方案

Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.

ggplot(df, aes(x=CH, y=value,group=CH))+
  geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
  scale_x_log10()

So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH) as numeric, and set group=CH in the call to aes(...).

With your real data there is another problem though. Your CH is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.

这篇关于如何创建具有大量连续x变量的geom_boxplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆