在R中使用数据框的列绘制多个箱形图 [英] Plot multiple box-plots using columns of dataframe in R

查看:1155
本文介绍了在R中使用数据框的列绘制多个箱形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一列分类数据(两个可能的值)和多个变量列的数据框。我需要绘制多个箱形图,每个变量列一个。每个图比较列1中给出的两个分类组之间的变量值。到目前为止,我通过为每列编写单独的图表调用来工作。

  #CREATE DATASET 
mydata< - data.frame(matrix(rlnorm(30 * 10,meanlog = 0) ,sdlog = 1),nrow = 30))
colnames(mydata)<-c(categ,var1,var2,var3,var4,var5,var6 var7var8var9)
mydata $ var2< - mydata $ var2 * 5
mydata $ categ< - sample(1:2)
mydata

#LAYOUT
par(mfrow = c(3,3),mar = c(4,4,0.5,0.5),mgp = c(1.5,0.3,0),tck = -0.01)

#BOXPLOTS
boxplot(var1〜categ,data = mydata,outpch = NA,ylim = c(0,8),Main =Title,ylab = VarLevel,tck = 1.0,names = c(categ1,categ2))
stripchart(var1_categ,data = mydata,vertical = TRUE,method =jitter,ylim = c(0, 8),pch = 21,cex = 1,col = c(rgb(255,0,0,100,max = 255),rgb(0,0,255,100,max = 255)),bg = rgb( 255,255,255,10,max = 255),add = TRUE)
test <-wilcox.test(var1〜categ,data = mydata)
pvalue< - test $ p.value
pvalueformatted< - format(pvalue,digits = 3,nsmall = 2)
mtext(paste(colnames(mydata) [2],p =,pvalueformatted),side = 1,line = -13,at = 0.9,cex = 0.6)

boxplot(var2_categ,data = mydata,outpch = NA ,ylim = c(0,40),Main =Title2,ylab =VarLevel,tck = 1.0,names = c(categ1,categ2))
stripchart(var2〜categ,data = mydata,vertical = TRUE,method =jitter,ylim = c(0,40),pch = 25,cex = 1,col = c(rgb(255,0,0,100,max = 255),rgb (0,0,255,100,max = 255)),bg = rgb(255,255,255,10,max = 255),add = TRUE)
test <-wilcox.test(var2〜 categ,data = mydata)
pvalue< - test $ p.value
pvalueformatted< - format(pvalue,digits = 3,nsmall = 2)
mtext(paste(colnames(mydata )[3],p =,pvalueformatted),side = 1,line = -13,at = 0.9,cex = 0.6)

两个问题:

1)我想使用一个函数或for循环来为每个数据列编写脚本调用脚本。不知道如何做到这一点。我看到一些相关的帖子,但无法完成。尝试现在使用基函数,但可以考虑ggplot或其他必要的。

2)作为循环/函数的一部分,是否有方法来调整每个图的y轴比例以适应变量的范围?所以对于一个给定的列,如果最大值是2,那么y轴的比例将上升到4.如果最大值是100,y轴将上升到110.

列号和子集 mydata 添加到函数中感兴趣的列中。通过迭代列号而不是列本身,您可以轻松访问正确的 colname ,以便稍后添加到图中。



您还需要在第3面(顶部)添加一个较小的外边距( oma ),以便可以在前3个图的位置打印p值。为了解决第二个问题 - 即减少y限制以适应数据范围 - 如果指定 outline =,那么这将是自动的。 FALSE 来抑制异常值的绘制。 (在你的代码中,你只需提供 NA 作为绘图角色来隐藏它们,但是箱形图仍然认为它们是)但是,通过设置 outline = FALSE ,计算的y限制将不会适应任何可能由呼叫绘制的异常值到 stripchart (我现在修改为 points ,因为它更简单一些)。

  par(mfrow = c(3,3),mar = c(3,3,0.5,0.5),mgp = c(1.5,0.3, 0),tck = -0.01,
oma = c(0,0,1,0))

sapply(seq_along(mydata)[ - 1],function(i){
y< - mydata [,i]
boxplot(y〜mydata $ categ,outline = FALSE,ylab =VarLevel,tck = 1.0,
names = c(categ1, 1)
points(y〜jitter(mydata $ categ,0.5),
col = ifelse(mydata $ categ == 1,'firebrick','slateblue'))
test< - wilcox.test(y〜mydata $ categ)
pvalue< - test $ p.value
pvalueformatted< - format(pvalue,digits = 3,nsmall = 2)
mtext(paste(colnames(mydata)[i],p =,pvalueformatted),side = 3,
line = 0.5,at = 0.9,cex = 0.6)
})

我也修改了你的 mtext 调用到第3面的plot,而不是指定第1面有大的负边界。




I have a dataframe with a column of categorical data (two possible values) and multiple variable columns. I need to plot multiple box-plots, one for each variable column. Each plot compares the value of the variable between the two categorical groups given in column 1. So far I have it working by writing an individual plot call for each column.

#CREATE DATASET
mydata <- data.frame(matrix(rlnorm(30*10,meanlog=0,sdlog=1), nrow=30))
colnames(mydata) <- c("categ", "var1","var2", "var3","var4", "var5", "var6", "var7", "var8", "var9")
mydata$var2 <- mydata$var2*5
mydata$categ <- sample(1:2)
mydata

#LAYOUT
par(mfrow=c(3,3), mar=c(4,4,0.5,0.5), mgp = c(1.5, 0.3, 0), tck = -0.01)

#BOXPLOTS
boxplot(var1 ~ categ, data = mydata, outpch = NA, ylim = c(0, 8), Main = "Title", ylab="VarLevel", tck = 1.0, names=c("categ1","categ2"))
stripchart(var1 ~ categ, data = mydata, vertical = TRUE, method = "jitter", ylim = c(0, 8), pch = 21, cex = 1, col=c(rgb(255, 0, 0, 100, max = 255), rgb(0, 0, 255, 100, max = 255)), bg = rgb(255, 255, 255, 10, max = 255), add = TRUE)
test <- wilcox.test(var1 ~ categ, data = mydata)
pvalue <- test$p.value
pvalueformatted <- format(pvalue, digits=3, nsmall=2)
mtext(paste(colnames(mydata)[2], " p = ", pvalueformatted), side=1, line=-13, at=0.9, cex = 0.6)

boxplot(var2 ~ categ, data = mydata, outpch = NA, ylim = c(0, 40), Main = "Title2", ylab="VarLevel", tck = 1.0, names=c("categ1","categ2"))
stripchart(var2 ~ categ, data = mydata, vertical = TRUE, method = "jitter", ylim = c(0, 40), pch = 25, cex = 1, col=c(rgb(255, 0, 0, 100, max = 255), rgb(0, 0, 255, 100, max = 255)), bg = rgb(255, 255, 255, 10, max = 255), add = TRUE)
test <- wilcox.test(var2 ~ categ, data = mydata)
pvalue <- test$p.value
pvalueformatted <- format(pvalue, digits=3, nsmall=2)
mtext(paste(colnames(mydata)[3], " p = ", pvalueformatted), side=1, line=-13, at=0.9, cex = 0.6)

Two questions:
1) I would like to use a function or for loop to script the plot call for each data column. Not sure how to do this. I saw a few related posts but couldn't quite get there. Trying to use base functions for now, though could consider ggplot or others if necessary.
2) As part of the loop/function, is there a way to adjust the y-axis scale of each plot to accommodate the range of the variable? So for a given column, if the maximum value is 2, the y axis scale would go up to 4. If the max was 100, the y axis would go up to 110.

Thoughts appreciated

解决方案

I would sapply over a vector of column numbers and subset mydata to the column of interest within the function. By iterating over column numbers rather than columns themselves, you have easy access to the correct colname to be added to the plot later.

You also need to add a small outer margin (oma) to side 3 (top) so that the p value can be printed there for the first 3 plots.

To address your second question - that of reducing the y limits to fit the range of the data - this will be automatic if you specify outline=FALSE to suppress plotting of outliers. (In your code, you simply supplied NA as the plotting character to hide them, but the boxplots still considered them part of the data when determining the axis limits.) However, by setting outline=FALSE, the y limits that are calculated will not accommodate any outliers that would otherwise be plotted by the call to stripchart (which I've now modified to points since it's a bit simpler).

par(mfrow=c(3,3), mar=c(3, 3, 0.5, 0.5), mgp = c(1.5, 0.3, 0), tck = -0.01,
    oma=c(0, 0, 1, 0))

sapply(seq_along(mydata)[-1], function(i) {
  y <- mydata[, i]
  boxplot(y ~ mydata$categ, outline=FALSE, ylab="VarLevel", tck = 1.0, 
          names=c("categ1","categ2"), las=1)
  points(y ~ jitter(mydata$categ, 0.5), 
     col=ifelse(mydata$categ==1, 'firebrick', 'slateblue'))
  test <- wilcox.test(y ~ mydata$categ)
  pvalue <- test$p.value
  pvalueformatted <- format(pvalue, digits=3, nsmall=2)
  mtext(paste(colnames(mydata)[i], " p = ", pvalueformatted), side=3, 
        line=0.5, at=0.9, cex = 0.6)  
})

Note I've also modified your mtext call to plot on side 3 rather than specifying side 1 with a large negative margin.

这篇关于在R中使用数据框的列绘制多个箱形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆