stat_sum和stat_identity给出奇怪的结果 [英] stat_sum and stat_identity give weird results
问题描述
我有以下代码,包括随机生成的演示数据:
n < - 10
组<< ; - rep(1:4,n)
mass.means <-c(10,20,15,30)
mass.sigma <-4
score.means< -c(5,5,7,4)
score.sigma < - 3
mass < - as.vector(model.matrix(〜0 + factor(group))%*%mass .means)+
rnorm(n * 4,0,mass.sigma)
得分< - as.vector(model.matrix(〜0 + factor(group))%*%score.means )+
rnorm(n * 4,0,score.sigma)
data < - data.frame(id = 1:(n * 4),group,mass,score)
头(数据)
其中给出:
id组质量分数
1 1 1 12.643603 5.015746
2 2 2 21.458750 5.590619
3 3 3 15.757938 8.777318
4 4 4 32.658551 6.365853
5 5 1 6.636169 5.885747
6 6 2 13.467437 6.390785
然后我想要在条形图中绘制分数分组的总和:
plot <-ggplot(data = data,aes(x = group,y = score))+
geom_bar(stat =sum)
plot
这给了我:
奇怪的是,使用 stat_identity
plot <-ggplot(data = data,aes(x = group) ,y =分数))+
geom_bar(stat =identity)
plot
这是一个错误?在R上使用ggplot2 1.0.0
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
$ x86 $ 64
$ linux-gnu
系统x86_64,linux-gnu
状态
主要3
未成年人1.2
年2014年
月10
日31
svn rev 66913
语言R
version.string R版本3.1.2(2014-10-31)
昵称南瓜头盔
或者我做错了什么?
y =分数))+stat_summary(fun.y =sum,geom =bar,position =identity)
plot
汇总(score〜group,data = data,FUN = sum)
#group score
#1 1 51.71279
#2 2 58.94611
#3 3 67.52100
#4 4 39.24484
编辑
: stat_sum
不起作用,因为它不会返回和。它返回位置观测数量和该位置在该位置的点数百分比。它是为不同的目的而设计的。这些文件说:对散点图上的重叠绘图很有用。
stat_identity
(kind of)works because works geom_bar
默认堆叠条形图。与我的解决方案相比,您的每个组合都有很多条,每组只有一个条形。看看这个:
plot <-ggplot(data = data,aes(x = group,y = score))+
geom_bar(stat =identity,color =red)
plot
<请注意以下警告:
警告信息:
当ymin!= 0时堆叠不正确
I have the following code, including randomly generated demo data:
n <- 10
group <- rep(1:4, n)
mass.means <- c(10, 20, 15, 30)
mass.sigma <- 4
score.means <- c(5, 5, 7, 4)
score.sigma <- 3
mass <- as.vector(model.matrix(~0+factor(group)) %*% mass.means) +
rnorm(n*4, 0, mass.sigma)
score <- as.vector(model.matrix(~0+factor(group)) %*% score.means) +
rnorm(n*4, 0, score.sigma)
data <- data.frame(id = 1:(n*4), group, mass, score)
head(data)
Which gives:
id group mass score
1 1 1 12.643603 5.015746
2 2 2 21.458750 5.590619
3 3 3 15.757938 8.777318
4 4 4 32.658551 6.365853
5 5 1 6.636169 5.885747
6 6 2 13.467437 6.390785
And then I want to plot the sum of "score", grouped by "group", in a bar chart:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="sum")
plot
This gives me:
Weirdly, using stat_identity
seems to give the result I am looking for:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="identity")
plot
Is this a bug? Using ggplot2 1.0.0 on R
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 1.2
year 2014
month 10
day 31
svn rev 66913
language R
version.string R version 3.1.2 (2014-10-31)
nickname Pumpkin Helmet
Or what am I doing wrong?
plot <- ggplot(data = data, aes(x = group, y = score)) +
stat_summary(fun.y = "sum", geom = "bar", position = "identity")
plot
aggregate(score ~ group, data=data, FUN=sum)
# group score
#1 1 51.71279
#2 2 58.94611
#3 3 67.52100
#4 4 39.24484
Edit:
stat_sum
does not work, because it doesn't just return the sum. It returns the "number of observations at position" and "percent of points in that panel at that position". It was designed for a different purpose. The docs say " Useful for overplotting on scatterplots."
stat_identity
(kind of) works because geom_bar
by default stacks the bars. You have many bars on top of each other in contrast to my solution that gives you just one bar per group. Look at this:
plot <- ggplot(data = data, aes(x = group, y = score)) +
geom_bar(stat="identity", color = "red")
plot
Also consider the warning:
Warning message:
Stacking not well defined when ymin != 0
这篇关于stat_sum和stat_identity给出奇怪的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!