如何在ggplot2中绘制(复杂)堆叠的barplot,而无需复杂的手动数据聚合 [英] How to plot a (sophisticated) stacked barplot in ggplot2, without complicated manual data aggregation
问题描述
我想绘制一个(多面)堆叠的barplot,其中X轴以百分比表示。另外频率标签显示在条形图中。
经过相当多的工作并在stackoverflow上查看了很多不同的问题之后,我找到了一个解决方案,用ggplot2解决这个问题。不过,我不直接用ggplot2来做,我手动将数据与表格调用进行汇总。我以一种复杂的方式进行手动聚合,并用临时变量手动计算百分比值(请参阅源代码注释手动聚合数据)。
我怎样才能做同样的阴谋,但以更好的方式没有手动和复杂的数据汇总?
library(ggplot2)
库(比例)
库(gridExtra)
库(plyr)
##
##随机数据
##
fact1 < - factor(floor(runif(1000,1.6)),
labels = c(A,B,C,D,E))
fact2 < - factor(floor(runif(1000,1.6)),
labels = c(g1,g2,g3,g4, g5))
##
##将x轴缩放为100%的堆栈条图表
## b
$ b ##手动汇总数据
##
mytable< - as.data.frame(table(fact1,fact2))
colnames(mytable)< - c(caseStudyID,Group ,Freq)
mytable $ total< - sapply(mytable $ caseStudyID,
函数(caseID)sum(subset(mytable,caseStudyID == caseID)$ Freq))
$ b $ mytable $ percent < - round((mytable $ Freq / mytable $ total)* 100,2)
mytable2< - ddply(mytable,。(caseStudyID),transform,pos = cumsum(percent) - 0.5 * percent)
##所有案例研究在一个图中(缩小到100%)
p1 < - ggplot(mytable2,aes(x = caseStudyID,y = percent,fill = Group))+
geom_bar(stat =身份)+
主题(legend.key.size = unit(0.4,cm))+
theme(axis.text.x = element_text(angle = 60,hjust = 1))+
geom_text(aes(label = sapply(Freq,function(x)ifelse(x> 0,x,NA)),y = pos),size = 3)#ifelse防止打印标签为0在酒吧内
print(p1)
..
在您创建数据后:
fact1< - 因数(floor(runif(1000,1,6)),
labels = c(A,B,C,D,E))
fact2 < - factor(floor(runif(1000,1.6)),
labels = c(g1,g2,g3,g4,g5))
$ b $ dat = data.frame(caseStudyID = fact1,Group = fact2)
您可以使用 position_fill
自动制作您想要的那种未标记图形:
ggplot(dat,aes(caseStudyID,fill = Group))+ geom_bar(position =fill)
不知道是否有办法自动生成文本标签。如果您想使用ggplot计算而不是单独执行,可以使用 ggplot_build
访问堆叠图形中的位置和计数。
p = ggplot(dat,aes(caseStudyID,fill = Group))+ geom_bar(position =fill)
ggplot_build(p)$ data [[ 1]]
这将返回一个带有(除其他外)数据框, count
, x
, y
, ymin
$ ymax
可用于创建定位标签的变量。
如果您希望垂直标签以每个类别为中心,首先在 ymin
和 ymax
之间设置一个值。
freq = ggplot_build(p)$ data [[1]]
freq $ y_pos =(freq $ ymin + freq $ ymax)/ 2
然后用注释
标签添加标签。 。
p + annotate(x = freq $ x,y = freq $ y_pos,label = freq $ count,geom = TEX t,size = 3)
I want to plot a (facetted) stacked barplot where the X-Axis is in percent. Also the Frequency labels are displayed within the bars.
After quite some work and viewing many different questions on stackoverflow, I found a solution on how to solve this with ggplot2. However, I don't do it directly with ggplot2, I manually aggregate my data with a table call. And I do this manual aggregation in a complicated way and also calculate the percent values manually with temp variables (see source code comment "manually aggregate data").
How can I do the same plot, but in a nicer way without the manual and complicated data aggregation?
library(ggplot2)
library(scales)
library(gridExtra)
library(plyr)
##
## Random Data
##
fact1 <- factor(floor(runif(1000, 1,6)),
labels = c("A","B", "C", "D", "E"))
fact2 <- factor(floor(runif(1000, 1,6)),
labels = c("g1","g2", "g3", "g4", "g5"))
##
## STACKED BAR PLOT that scales x-axis to 100%
##
## manually aggregate data
##
mytable <- as.data.frame(table(fact1, fact2))
colnames(mytable) <- c("caseStudyID", "Group", "Freq")
mytable$total <- sapply(mytable$caseStudyID,
function(caseID) sum(subset(mytable, caseStudyID == caseID)$Freq))
mytable$percent <- round((mytable$Freq/mytable$total)*100,2)
mytable2 <- ddply(mytable, .(caseStudyID), transform, pos = cumsum(percent) - 0.5*percent)
## all case studies in one plot (SCALED TO 100%)
p1 <- ggplot(mytable2, aes(x=caseStudyID, y=percent, fill=Group)) +
geom_bar(stat="identity") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
geom_text(aes(label = sapply(Freq, function(x) ifelse(x>0, x, NA)), y = pos), size = 3) # the ifelse guards against printing labels with "0" within a bar
print(p1)
..
After you make the data:
fact1 <- factor(floor(runif(1000, 1,6)),
labels = c("A","B", "C", "D", "E"))
fact2 <- factor(floor(runif(1000, 1,6)),
labels = c("g1","g2", "g3", "g4", "g5"))
dat = data.frame(caseStudyID=fact1, Group=fact2)
You can automate making an unlabeled graph of the kind that you want with position_fill
:
ggplot(dat, aes(caseStudyID, fill=Group)) + geom_bar(position="fill")
I don't know if there's a way to generate the text labels automatically. The positions and counts from the stacked graph are accessible with ggplot_build
, if you want to use what ggplot calculates instead of doing it separately.
p = ggplot(dat, aes(caseStudyID, fill=Group)) + geom_bar(position="fill")
ggplot_build(p)$data[[1]]
That will return a dataframe with (among other things), count
, x
, y
, ymin
, and ymax
variables that can be used to create positioned labels.
If you want the labels vertically centered in each category, first make a column with values halfway between ymin
and ymax
.
freq = ggplot_build(p)$data[[1]]
freq$y_pos = (freq$ymin + freq$ymax) / 2
Then add the labels to the graph with annotate
.
p + annotate(x=freq$x, y=freq$y_pos, label=freq$count, geom="text", size=3)
这篇关于如何在ggplot2中绘制(复杂)堆叠的barplot,而无需复杂的手动数据聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!