为具有单个组值的类别散点图添加平均线 [英] Add average lines for categorical scatterplots with single group values
问题描述
我有一个这样的分类散点图:
I have a categorical scatter plot like this:
:
data <- runif(50, 13, 17)
factors <- as.factor(sample(1:3, 50, replace = TRUE))
groups <- as.factor(sample(1:3, 50, replace = TRUE))
data_table <- data.frame(data, factors)
g <- ggplot(data_table, aes(y = data_table[, 1], x = data_table[, 2], colour = groups)) + geom_point(size = 1.5)
我试图为每个x-group添加一条平均线,但是我找不到正确的方法.我已经尝试使用此问题中所述的过程,但是它我认为这是行不通的,因为我的x组每个都由一个x值组成,因此我认为程序应该有所不同.
I am trying to add an average line for each x-group, but I can't manage to find the right way. I have already tried with the procedure described in this question, but it doesn't work, I reckon because my x-groups are composed of a single x-value each, for which I believe the procedure should be different.
如果要添加,请更详细地说明:
More in detail, if I add:
+ geom_line(stat = "hline", yintercept = "mean", aes(colour = data_table[, 2]))
在上一行代码中,出现以下错误: geom_path:每个组仅包含一个观察值.您是否需要调整组的审美?.
to the previous code line, it gives me the following error: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?.
如果我尝试执行该问题的答案中建议的步骤,请添加:
If I try with the procedure suggest in the answer to that question, by adding:
+ geom_errorbar(stat = "hline", yintercept = "mean", width=0.8, aes(ymax=..y..,ymin=..y..))
到我的初始代码(我删除了 geom_jitter(position = position_jitter(width = 0.4))
的那段代码,因为它在我的数据图中添加了随机点),我得到了三行每组(每个对应于特定x组的红色,绿色,蓝色三个组的平均值),如下图所示:
to my initial code (I have removed the geom_jitter(position = position_jitter(width = 0.4))
piece of code, because it added random points to my data plot), I get three lines for each group (each corresponding to the mean of the three groups indicated in red, green, blue for that specifical x-group), as shown in this picture:
有人对如何解决此问题有任何建议吗?
Does anyone have any suggestion on how to fix this?
谢谢.
推荐答案
以下代码应为您提供所需的结果:
The following code should give you the desired result:
# creating reproducible data
set.seed(1)
data <- runif(50, 13, 17)
factors <- as.factor(sample(1:3, 50, replace = TRUE))
groups <- as.factor(sample(1:3, 50, replace = TRUE))
data_table <- data.frame(data, factors, groups)
# creating the plot
ggplot(data=data_table, aes(x=factor(factors), y=data, color=groups)) +
geom_point() +
geom_errorbar(stat = "hline", yintercept = "mean", width=0.6, aes(ymax=..y.., ymin=..y.., group=factor(factors)), color="black")
给出:
检查方法是否正确:
> by(data_table$data, data_table$factors, mean)
data_table$factors: 1
[1] 15.12186
-------------------------------------------------------------------------------------------------
data_table$factors: 2
[1] 15.03746
-------------------------------------------------------------------------------------------------
data_table$factors: 3
[1] 15.24869
得出的结论是,均值已正确显示在图中.
which leads to the conclusion that the means are correctly displayed in the plot.
根据@rrs的建议,您也可以将其与箱形图组合:
Following the suggestion of @rrs, you could also combine it with a boxplot:
ggplot(data=data_table, aes(x=factor(factors), y=data, color=groups)) +
geom_boxplot(aes(middle=mean(data), color=NULL)) +
geom_point(size=2.5)
给出:
但是,中线代表中位数,而不是均值.
However, the middle line represents the median and not the mean.
这篇关于为具有单个组值的类别散点图添加平均线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!