geom_abline似乎不尊重facet_grid中的组[ggplot2] [英] geom_abline does not seem to respect groups in facet_grid [ggplot2]
问题描述
只是试图了解geom_abline如何与ggplot中的构面一起工作.
Just trying to understand how geom_abline works with facets in ggplot.
我有一个学生考试成绩的数据集.这些位于4列的数据表dt中:
I have a dataset of student test scores. These are in a data table dt with 4 columns:
student: unique student ID
cohort: grouping factor for students (A, B, … H)
subject: subject of the test (English, Math, Science)
score: the test score for that student in that subject
目标是比较同类群组.以下代码段创建了一个样本数据集.
The goal is to compare cohorts. The following snippet creates a sample dataset.
library(data.table)
## cohorts: list of cohorts with number of students in each
cohorts <- data.table(name=toupper(letters[1:8]),size=as.numeric(c(8,25,16,30,10,27,13,32)))
## base: assign students to cohorts
base <- data.table(student=c(1:sum(cohorts$size)),cohort=rep(cohorts$name,cohorts$size))
## scores for each subject
english <- data.table(base,subject="English", score=rnorm(nrow(base), mean=45, sd=50))
math <- data.table(base,subject="Math", score=rnorm(nrow(base), mean=55, sd=25))
science <- data.table(base,subject="Science", score=rnorm(nrow(base), mean=70, sd=25))
## combine
dt <- rbind(english,math,science)
## clip scores to (0,100)
dt$score<- (dt$score>=0) * dt$score
dt$score<- (dt$score<=100)*dt$score + (dt$score>100)*100
以下显示的是按受试者分组且按主题划分的平均得分(按95%CL分组),并包括(蓝色,虚线)参考线(使用geom_abline).
The following displays mean score by cohort with 95% CL, facetted by subject, and includes a (blue, dashed) reference line (using geom_abline).
library(ggplot2)
library(Hmisc)
ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100)
ggp <- ggp + stat_summary(fun.data="mean_cl_normal")
ggp <- ggp + geom_abline(aes(slope=0,intercept=mean(score)),color="blue",linetype="dashed")
ggp <- ggp + facet_grid(subject~.)
ggp
问题是参考线(来自geom_abline)在所有方面都是相同的(=所有学生和所有科目的总平均分).因此stat_summary似乎尊重facet_grid中隐含的分组(例如,按主题),但abline则不这样做. 任何人都可以解释原因吗?
The problem is that the reference line (from geom_abline) is the same in all facets (= the grand average score for all students and all subjects). So stat_summary seems to respect the grouping implied in facet_grid (e.g., by subject), but abline does not. Can anyone explain why?
NB:我意识到可以通过创建一个单独的分组均值表并将其用作geom_abline(如下)中的数据源来解决此问题,但是为什么这是必需的?
NB: I realize this problem can be solved by creating a separate table of group means and using that as the data source in geom_abline (below), but why is this necessary?
means <- dt[,list(mean.score=mean(score)),by="subject"]
ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100)
ggp <- ggp + stat_summary(fun.data="mean_cl_normal")
ggp <- ggp + geom_abline(data=means, aes(slope=0,intercept=mean.score),color="blue",linetype="dashed")
ggp <- ggp + facet_grid(subject~.)
ggp
推荐答案
这应该做您想要的. stat_*
函数为每个构面使用不同的数据收集.我认为geom_*
函数的aes
中的任何表达式都旨在用于每个y值的转换.
This should do what you want. The stat_*
functions use different collections of data for each facet. I think any expressions in the aes
of the geom_*
functions are intended to be used for the transformation of each y-value.
ggplot(dt,aes(x=cohort, y=score)) +
stat_summary(fun.data="mean_cl_normal") +
stat_smooth(formula=y~1,aes(group=1),method="lm",se=FALSE) +
facet_grid(subject~.) + ylim(0,100)
这篇关于geom_abline似乎不尊重facet_grid中的组[ggplot2]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!