按照ggplot2堆积的条形图按大小排列堆栈 [英] Ordering stacks by size in a ggplot2 stacked bar graph
问题描述
序列丰度长度
CAGTG 3 25
CGCTG 82 23
GGGAC 4 25
CTATC 16 23
CTTGA 14 25
CAAGG 9 24
GTAAT 5 24
ACGAA 32 22
TCGGA 10 22
TAGGC 30 21
TGCCG 25 21
TCCGG 2 21
CGCCT 22 24
TTGGC 4 22
ATTCC 4 23
我只在这里显示每个序列的前4个单词,但实际上他们是长度长。我正在查看我在这里获得的每个大小类别的丰富序列。另外,我想要显示某个特定序列在其大小类别中所占丰度的比例。目前,我可以制作如下的堆积条形图:
ggplot(tab,aes(x =长度,y =丰度, fill = Sequence))
+ geom_bar(stat ='identity')
+ opts(legend.position =none)
对于像这样的小数据集,这很好,但是我的实际数据集中有大约170万行。它看起来非常丰富多彩,我可以看到特定的序列在一个尺寸类别中占据了大多数丰度,但它非常混乱。
我希望能够订购彩色根据该序列的丰富程度,为每个尺寸堆叠酒吧。即在其堆叠中具有最高丰度的条块位于每个堆叠的底部,而具有最低丰度的条块位于顶部。它应该看起来更像这样呈现。
关于如何在ggplot2中做到这一点的任何想法?我知道aes()中有一个order参数,但是我无法弄清楚它应该如何处理我所用格式的数据。
在ggplot2的堆叠barplot中绘制条形图(从底部到顶部)的顺序基于定义组的因素的排序。因此,序列
因子必须根据丰度
重新排序。但是要获得正确的堆叠顺序,必须颠倒顺序。
ab.tab $ Sequence< - reorder(ab。 (Sequence),levels = rev(levels(ab.tab $ Sequence)))
<序列,ab.tab $ Abundance)
ab.tab $ Sequence
现在使用您的代码给出您请求的图
ggplot(ab.tab,aes(x = Length,y = Abundance,fill = Sequence))+
geom_bar(stat ='identity')+
opts(legend.position =none)
ggplot(ab.tab,aes(x =长度,y = Abundance,group = Sequence))+
geom_bar(stat ='identity',color =black,fill = NA)
So i have a load of data which I have sampled as an example below:
Sequence Abundance Length
CAGTG 3 25
CGCTG 82 23
GGGAC 4 25
CTATC 16 23
CTTGA 14 25
CAAGG 9 24
GTAAT 5 24
ACGAA 32 22
TCGGA 10 22
TAGGC 30 21
TGCCG 25 21
TCCGG 2 21
CGCCT 22 24
TTGGC 4 22
ATTCC 4 23
I'm only showing the first 4 words of each sequence here, but in reality they are "Length" long. I am looking at the abundances of sequences for each size class that I have here. In addition, I want to visualise the proportion of abundance that a particular sequence represents within its size class. Currently, I can make a stacked bar graph like this:
ggplot(tab, aes(x=Length, y=Abundance, fill=Sequence))
+ geom_bar(stat='identity')
+ opts(legend.position="none")
This is fine for a small data set like this, but I have about 1.7 million rows in my actual data set. It looks very colourful and I can see that particular sequences hold a majority abundance in one size class but it is very messy.
I would like to be able to order the coloured stacked bars for each size by that sequence's abundance. i.e. the bars with the highest abundance within their stack are at the bottom of each stack and the bars with the lowest abundance are at the top. It should look a lot more presentable that way.
Any ideas on how to do this in ggplot2? I know there's an "order" parameter in the aes() but I can't work out what it should do with data in the format that I have.
The order that bars are drawn (bottom to top) in a stacked barplot in ggplot2 is based on the ordering of the factor which defines the groups. So the Sequence
factor must be reordered based on the Abundance
. But to get the right stacking order, the order must be reversed.
ab.tab$Sequence <- reorder(ab.tab$Sequence, ab.tab$Abundance)
ab.tab$Sequence <- factor(ab.tab$Sequence, levels=rev(levels(ab.tab$Sequence)))
Using your code now gives the plot you requested
ggplot(ab.tab, aes(x=Length, y=Abundance, fill=Sequence)) +
geom_bar(stat='identity') +
opts(legend.position="none")
I might recommend, however, something slightly different. Since you are suppressing the scale which maps color to sequence, and your description seems to indicate that you don't care about the specific sequence anyway (and there will be many), why not leave that part out? Just draw the outlines of the bars without any filling color.
ggplot(ab.tab, aes(x=Length, y=Abundance, group=Sequence)) +
geom_bar(stat='identity', colour="black", fill=NA)
这篇关于按照ggplot2堆积的条形图按大小排列堆栈的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!