订单数据在ggplot2中绘制barplot [英] order data to plot barplot in ggplot2
问题描述
我需要建立我的数据的barplot,显示不同样本中的细菌相对丰度(每个列在总数据集中总和应为1)。
子集我的数据:
> mydata
Taxon CD6 CD1 CD12
Actinomycetaceae; g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae; g__Atopobium 0.018691589 0.003244536 0.00447774
棒状杆菌科; g__棒状杆菌0.001846083 0.006403689 0.000516662
Micrococcaceae; g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae; g__Porphyromonas 0.073497173 0.065915301 0.175406872
(CD6,CD1,CD12),其中y值是细菌种类的相对丰度(Taxon列)。
I认为(但我不确定)我的数据格式不适合做这个情节,因为我没有一个变量来组合,就像我发现的例子一样......
ggplot(data)+ geom_bar(aes(x = revision,y = added),stat =identity,fill =white,color =black p>
有没有一种方法可以将数据排序为正确的inp ut到这个代码?
或者我该如何修改它?
Thanks!
你想要这样的东西吗?
#sample data
df < - read.table(header = T,sep =,text =
Taxon CD6 CD1 CD12
Actinomycetaceae; g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae; g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae; g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae; g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae; g__Porphyromonas 0.073497173 0.065915301 0.175406872)
#将宽数据格式转换为长格式
require(reshape2)
df.long < - melt(df,id .vars =Taxon,
measure.vars = grep(CD \\d +,names(df),val = T),
variable.name =sample,
value.name =value)
#计算比例
require(plyr)
df.long< - ddply(df.long,。(sample), transform,value = value / sum(value))
#以id
的顺序排列样本df.long $ sample< - reorder(df.long $ sample,as.numer ic(sub(CD,,df.long $ sample)))
#plot using ggplot
require(ggplot2)
ggplot(df.long,aes (x = sample,y = value,fill = Taxon))+
geom_bar(stat =identity)+
scale_fill_manual(values = scales :: hue_pal(h = c(0,360)+ 15,#添加手动颜色
c = 100,
l = 65,
h.start = 0,
direction = 1)(length(levels(df $ Taxon)))))
I need to build a barplot of my data, showing bacterial relative abundance in different samples (each column should sum to 1 in the complete dataset).
A subset of my data:
> mydata
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872
What I'd like to have is a bar for each sample (CD6, CD1, CD12), where the y values are the relative abundance of bacterial species (the Taxon column).
I think (but I'm not sure) my data format is not right to do the plot, since I don't have a variable to group by like in the examples I found...
ggplot(data) + geom_bar(aes(x=revision, y=added), stat="identity", fill="white", colour="black")
Is there a way to order my data making them right as input to this code? Or how can I modify it? Thanks!
Do you want something like this?
# sample data
df <- read.table(header=T, sep=" ", text="
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872")
# convert wide data format to long format
require(reshape2)
df.long <- melt(df, id.vars="Taxon",
measure.vars=grep("CD\\d+", names(df), val=T),
variable.name="sample",
value.name="value")
# calculate proportions
require(plyr)
df.long <- ddply(df.long, .(sample), transform, value=value/sum(value))
# order samples by id
df.long$sample <- reorder(df.long$sample, as.numeric(sub("CD", "", df.long$sample)))
# plot using ggplot
require(ggplot2)
ggplot(df.long, aes(x=sample, y=value, fill=Taxon)) +
geom_bar(stat="identity") +
scale_fill_manual(values=scales::hue_pal(h = c(0, 360) + 15, # add manual colors
c = 100,
l = 65,
h.start = 0,
direction = 1)(length(levels(df$Taxon))))
这篇关于订单数据在ggplot2中绘制barplot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!