按照ggplot2堆积的条形图按大小排列堆栈 [英] Ordering stacks by size in a ggplot2 stacked bar graph

查看:2561
本文介绍了按照ggplot2堆积的条形图按大小排列堆栈的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 序列丰度长度
CAGTG 3 25
CGCTG 82 23
GGGAC 4 25
CTATC 16 23
CTTGA 14 25
CAAGG 9 24
GTAAT 5 24
ACGAA 32 22
TCGGA 10 22
TAGGC 30 21
TGCCG 25 21
TCCGG 2 21
CGCCT 22 24
TTGGC 4 22
ATTCC 4 23

我只在这里显示每个序列的前4个单词,但实际上他们是长度长。我正在查看我在这里获得的每个大小类别的丰富序列。另外,我想要显示某个特定序列在其大小类别中所占丰度的比例。目前,我可以制作如下的堆积条形图:

  ggplot(tab,aes(x =长度,y =丰度, fill = Sequence))
+ geom_bar(stat ='identity')
+ opts(legend.position =none)



对于像这样的小数据集,这很好,但是我的实际数据集中有大约170万行。它看起来非常丰富多彩,我可以看到特定的序列在一个尺寸类别中占据了大多数丰度,但它非常混乱。



我希望能够订购彩色根据该序列的丰富程度,为每个尺寸堆叠酒吧。即在其堆叠中具有最高丰度的条块位于每个堆叠的底部,而具有最低丰度的条块位于顶部。它应该看起来更像这样呈现。



关于如何在ggplot2中做到这一点的任何想法?我知道aes()中有一个order参数,但是我无法弄清楚它应该如何处理我所用格式的数据。

解决方案

在ggplot2的堆叠barplot中绘制条形图(从底部到顶部)的顺序基于定义组的因素的排序。因此,序列因子必须根据丰度重新排序。但是要获得正确的堆叠顺序,必须颠倒顺序。

  ab.tab $ Sequence<  -  reorder(ab。 (Sequence),levels = rev(levels(ab.tab $ Sequence)))
<序列,ab.tab $ Abundance)
ab.tab $ Sequence

现在使用您的代码给出您请求的图

  ggplot(ab.tab,aes(x = Length,y = Abundance,fill = Sequence))+ 
geom_bar(stat ='identity')+
opts(legend.position =none)

  ggplot(ab.tab,aes(x =长度,y = Abundance,group = Sequence))+ 
geom_bar(stat ='identity',color =black,fill = NA)


So i have a load of data which I have sampled as an example below:

Sequence  Abundance   Length
CAGTG    3       25
CGCTG    82      23
GGGAC    4       25
CTATC    16      23
CTTGA    14      25
CAAGG    9       24
GTAAT    5       24
ACGAA    32      22
TCGGA    10      22
TAGGC    30      21
TGCCG    25      21
TCCGG    2       21
CGCCT    22      24
TTGGC    4       22
ATTCC    4       23

I'm only showing the first 4 words of each sequence here, but in reality they are "Length" long. I am looking at the abundances of sequences for each size class that I have here. In addition, I want to visualise the proportion of abundance that a particular sequence represents within its size class. Currently, I can make a stacked bar graph like this:

ggplot(tab, aes(x=Length, y=Abundance, fill=Sequence)) 
  + geom_bar(stat='identity') 
  + opts(legend.position="none")

This is fine for a small data set like this, but I have about 1.7 million rows in my actual data set. It looks very colourful and I can see that particular sequences hold a majority abundance in one size class but it is very messy.

I would like to be able to order the coloured stacked bars for each size by that sequence's abundance. i.e. the bars with the highest abundance within their stack are at the bottom of each stack and the bars with the lowest abundance are at the top. It should look a lot more presentable that way.

Any ideas on how to do this in ggplot2? I know there's an "order" parameter in the aes() but I can't work out what it should do with data in the format that I have.

解决方案

The order that bars are drawn (bottom to top) in a stacked barplot in ggplot2 is based on the ordering of the factor which defines the groups. So the Sequence factor must be reordered based on the Abundance. But to get the right stacking order, the order must be reversed.

ab.tab$Sequence <- reorder(ab.tab$Sequence, ab.tab$Abundance)
ab.tab$Sequence <- factor(ab.tab$Sequence, levels=rev(levels(ab.tab$Sequence)))

Using your code now gives the plot you requested

ggplot(ab.tab, aes(x=Length, y=Abundance, fill=Sequence)) +
  geom_bar(stat='identity') +
  opts(legend.position="none")

I might recommend, however, something slightly different. Since you are suppressing the scale which maps color to sequence, and your description seems to indicate that you don't care about the specific sequence anyway (and there will be many), why not leave that part out? Just draw the outlines of the bars without any filling color.

ggplot(ab.tab, aes(x=Length, y=Abundance, group=Sequence)) +
  geom_bar(stat='identity', colour="black", fill=NA)

这篇关于按照ggplot2堆积的条形图按大小排列堆栈的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆