使用计算的统计量在ggplot2 boxplot中排序x轴 [英] Ordering x-axis in ggplot2 boxplot using computed statistic

查看:1550
本文介绍了使用计算的统计量在ggplot2 boxplot中排序x轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据,我想使用ggplot2作为箱形图显示。它基本上是计数,由另外两个变量分层。这里有一个数据的例子(实际上还有很多,但结构是一样的):

pre $ TAG计数条件
A 5 1
A 6 1
A 6 1
A 6 2
A 7 2
A 7 2
B 1 1
B 2 1
B 2 1
B 12 2
B 8 2
B 10 2
C 10 1
C 12 1
C 13 1
C 7 2
C 6 2
C 10 2

对于每个标签,条件1和条件2都有固定数量的观察值(这里是3,但实际数据更多)。我想要一个如下所示的盒子图('s'是一个如上所述的数据框):

  ggplot(s,aes( x = TAG,y = Count,fill = factor(Condition)))+ geom_boxplot()

< img src =https://i.stack.imgur.com/OilRS.pngalt =示例数据图>



这很好,但我希望能够通过每个标签的Wilcoxon测试的p值来订购x轴。例如,对于上面的数据,这些值将是(分别为标签A,B和C):

 > ; wilcox.test(c(5,6,6),c(6,7,7))$ p.value 
[1] 0.1572992
> wilcox.test(c(1,2,2),c(12,8,10))$ p.value
[1] 0.0765225
> wilcox.test(c(10,12,13),c(7,6,10))$ p.value
[1] 0.1211833

这会导致x轴上的排序A,C,B(从最大到最小)。但我不知道如何去将这些信息添加到我的数据中(具体地说,只需在标签级别附加一个p值,而不是添加整个额外的列),或者如何使用它来更改x轴订购。任何帮助不胜感激。

解决方案

这是一种方法。第一步是计算每个 TAG 的p值。我们通过使用 ddply 来完成这个工作,它通过TAG分割数据,并使用公式接口来计算p值< wilcox.test 。 plot语句根据其p值重新排列TAG。

  library(ggplot2);库(plyr)
dfr2< - ddply(dfr,。(TAG),transform,
pval = wilcox.test(Count_ Condition)$ p.value)

qplot(reorder(TAG,pval),Count,fill = factor(Condition),geom ='boxplot',
data = dfr2)


I have some data that I want to display as a box plot using ggplot2. It's basically counts, stratified by two other variables. Here's an example of the data (in reality there's a lot more, but the structure is the same):

TAG Count Condition
A     5         1
A     6         1
A     6         1
A     6         2
A     7         2
A     7         2
B     1         1
B     2         1
B     2         1
B    12         2
B     8         2
B    10         2
C    10         1
C    12         1
C    13         1
C     7         2
C     6         2
C    10         2

For each Tag, there are a fixed number of observations in condition 1, and condition 2 (here it's 3, but in the real data it's much more). I want a box plot like the following ('s' is a dataframe arranged as above):

ggplot(s, aes(x=TAG, y=Count, fill=factor(Condition))) + geom_boxplot()

This is fine, but I want to be able to order the x-axis by the p-value from a Wilcoxon test for each Tag. For example, with the above data, the values would be (for the tags A,B, and C respectively):

> wilcox.test(c(5,6,6),c(6,7,7))$p.value
[1] 0.1572992
> wilcox.test(c(1,2,2),c(12,8,10))$p.value
[1] 0.0765225
> wilcox.test(c(10,12,13),c(7,6,10))$p.value
[1] 0.1211833

Which would induce the ordering A,C,B on the x-axis (largest to smallest). But I don't know how to go about adding this information into my data (specifically, attaching a p-value at just the tag level, rather than adding a whole extra column), or how to use it to change the x-axis order. Any help greatly appreciated.

解决方案

Here is a way do it. The first step is to calculate the p-values for each TAG. We do this by using ddply which splits the data by TAG, and calculates the p-value using the formula interface to wilcox.test. The plot statement reorders the TAG based on its p-value.

library(ggplot2); library(plyr)
dfr2 <- ddply(dfr, .(TAG), transform, 
  pval = wilcox.test(Count ~ Condition)$p.value)

qplot(reorder(TAG, pval), Count, fill = factor(Condition), geom = 'boxplot', 
  data = dfr2)

这篇关于使用计算的统计量在ggplot2 boxplot中排序x轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆