问题与ggplot2,geom_bar和位置=“闪避”:堆叠有正确的y值,闪避不 [英] Issue with ggplot2, geom_bar, and position="dodge": stacked has correct y values, dodged does not

查看:1852
本文介绍了问题与ggplot2,geom_bar和位置=“闪避”:堆叠有正确的y值,闪避不的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很了解 geom_bar() position =dodge。我试图制作一些说明两组的棒图。数据最初来自两个独立的数据框。根据和这一个 ,我也尝试添加 group = 选项而没有成功(产生与上面相同的闪避图):

  ggplot(df,aes(x = factor(names),y = values,fill = factor(num),group = factor(num)))+ 
geom_bar(stat =身份,位置=闪避)

我不明白为什么堆叠的作品很棒,闪电并不只是把它们并排放在顶端。






ETA:我在ggplot上发现了最近的问题 google group,建议添加 alpha = 0.5 看看发生了什么。 ggplot不是从每个分组中获取最大值;它实际上是为每个值叠加在一条线上。



看起来,当使用 position =dodge,ggplot只期望每x一个y。我联系了ggplot开发人员Winston Chang,以确认以及询问这是否可以改变,因为我没有看到优势。

看来 stat =identity应该告诉ggplot计算 y =在 aes()中传递的val ,而不是 stat =identity和传递无y值。

目前,解决方法似乎是(对于上面的原始df)进行聚合,因此每个x只有一个y:

  df2 < -  aggregate(df $ values,by = list(df $ names,df $ num),FUN = sum)
p <-ggplot(df2,aes(x = Group.1,y = x,fill = factor(Group.2)))
p <-p + geom_bar(stat =identity,position =dodge)
p

解决方案

我认为问题在于您想要堆栈 num 组的 值, num 值之间的闪避 $ C>。
这可能有助于了解将大纲添加到条形图时会发生什么。

  library(ggplot2)
set.seed(123)
df< - data.frame(
id = 1:18,
names = rep(LETTERS [1:3],6),
num = c(rep(1,15),rep(2,3)),
values = sample(1:10,18,replace = TRUE)

默认情况下,堆叠的棒数量很多 - 您只是看不到它们是分开的,除非您有大纲:

 #堆积条
ggplot(df,aes(x = factor(names),y = values, fill = factor(num)))+
geom_bar(stat =identity,color =black)



如果你躲闪,你会得到在 num 值之间闪避的酒吧,但是在 num :

 #在'num'上闪避,但是有些重叠的条形图
ggplot(df,ae s(x = factor(names),y = values,fill = factor(num)))+
geom_bar(stat =identity,color =black,position =dodge,alpha = 0.1)



如果您还添加 id 作为分组变种,它会闪避所有人:

 #使用独特的'id'作为分组var 
ggplot(df,aes(x = factor(names),y = values,fill = factor(num),group = factor(id)))+
geom_bar(stat =identity,color =black,position =dodge, alpha = 0.1)



我认为你想要的是闪避和堆叠,但是你不能这样做。
所以最好的办法是自己总结一下数据。

  library(plyr)
df2< - ddply(df,c(names,num),summarize,values = sum(values))

ggplot(df2,aes(x = factor(names),y = values, fill = factor(num)))+
geom_bar(stat =identity,color =black,position =dodge)


I'm having quite the time understanding geom_bar() and position="dodge". I was trying to make some bar graphs illustrating two groups. Originally the data was from two separate data frames. Per this question, I put my data in long format. My example:

test <- data.frame(names=rep(c("A","B","C"), 5), values=1:15)
test2 <- data.frame(names=c("A","B","C"), values=5:7)

df <- data.frame(names=c(paste(test$names), paste(test2$names)), num=c(rep(1, 
nrow(test)), rep(2, nrow(test2))), values=c(test$values, test2$values))

I use that example as it's similar to the spend vs. budget example. Spending has many rows per names factor level whereas the budget only has one (one budget amount per category).

For a stacked bar plot, this works great:

ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity")

In particular, note the y value maxes. They are the sums of the data from test with the values of test2 shown on blue on top.

Based on other questions I've read, I simply need to add position="dodge" to make it a side-by-side plot vs. a stacked one:

ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) + 
geom_bar(stat="identity", position="dodge")

It looks great, but note the new max y values. It seems like it's just taking the max y value from each names factor level from test for the y value. It's no longer summing them.

Per some other questions (like this one and this one, I also tried adding the group= option without success (produces the same dodged plot as above):

ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(num))) +
geom_bar(stat="identity", position="dodge")

I don't understand why the stacked works great and the dodged doesn't just put them side by side instead of on top.


ETA: I found a recent question about this on the ggplot google group with the suggestion to add alpha=0.5 to see what's going on. It isn't that ggplot is taking the max value from each grouping; it's actually over-plotting bars on top of one another for each value.

It seems that when using position="dodge", ggplot expects only one y per x. I contacted Winston Chang, a ggplot developer about this to confirm as well as to inquire if this can be changed as I don't see an advantage.

It seems that stat="identity" should tell ggplot to tally the y=val passed inside aes() instead of individual counts which happens without stat="identity" and when passing no y value.

For now, the workaround seems to be (for the original df above) to aggregate so there's only one y per x:

df2 <- aggregate(df$values, by=list(df$names, df$num), FUN=sum)
p <- ggplot(df2, aes(x=Group.1, y=x, fill=factor(Group.2)))
p <- p + geom_bar(stat="identity", position="dodge")
p

解决方案

I think the problem is that you want to stack within values of the num group, and dodge between values of num. It might help to look at what happens when you add an outline to the bars.

library(ggplot2)
set.seed(123)
df <- data.frame(
  id     = 1:18,
  names  = rep(LETTERS[1:3], 6),
  num    = c(rep(1, 15), rep(2, 3)),
  values = sample(1:10, 18, replace=TRUE)
)

By default, there are a lot of bars stacked - you just don't see that they're separate unless you have an outline:

# Stacked bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) + 
  geom_bar(stat="identity", colour="black")

If you dodge, you get bars that are dodged between values of num, but there may be multiple bars within each value of num:

# Dodged on 'num', but some overplotted bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) + 
  geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)

If you also add id as a grouping var, it'll dodge all of them:

# Dodging with unique 'id' as the grouping var
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(id))) + 
  geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)

I think what you want is to both dodge and stack, but you can't do both. So the best thing is to summarize the data yourself.

library(plyr)
df2 <- ddply(df, c("names", "num"), summarise, values = sum(values))

ggplot(df2, aes(x=factor(names), y=values, fill=factor(num))) + 
  geom_bar(stat="identity", colour="black", position="dodge")

这篇关于问题与ggplot2,geom_bar和位置=“闪避”:堆叠有正确的y值,闪避不的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆