跨组添加趋势线并在分组的小提琴图或箱图中设置刻度标签 [英] Adding trend lines across groups and setting tick labels in a grouped violin plot or box plot
问题描述
我使用R
的ggplot2
geom_violin
添加回归趋势线来绘制我正在绘制的xy
分组数据:
I have xy
grouped data that I'm plotting using R
's ggplot2
geom_violin
adding regression trend lines:
以下是数据:
library(dplyr)
library(plotly)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
我当前的情节:
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal()
我的问题是:
- 如何摆脱
legend
的alpha
部分? - 我希望
x-axis
ticks
是df$group
而不是df$group_age
,这意味着在group
中心的每个group
中每个tick
,其中标签为group
.考虑并非所有group
都具有所有age
的情况-例如,如果某个group
仅具有两个age
,并且我很确定ggplot
仅会显示这两个age
,我希望tick
仍位于两个age
之间.
- How do I get rid of the
alpha
part of thelegend
? - I would like the
x-axis
ticks
to bedf$group
rather thandf$group_age
, which means atick
per eachgroup
at the center of thatgroup
where the label isgroup
. Consider a situation where not allgroup
s have allage
s - for example, if a certaingroup
has only two of theage
s and I'm pretty sureggplot
will only present only these twoage
s, I'd like thetick
to still be centered between their twoage
s.
另一个问题:
在每个group
的顶部绘制每个拟合斜率的p值也很好.
It would also be nice to have the p-values of each fitted slope plotted on top of each group
.
我尝试过:
library(ggpmisc)
my.formula <- value ~ group_age
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal() +
stat_poly_eq(formula = my.formula,aes(label=stat(p.value.label)),parse=T)
但是通过以下warning
消息,我得到了与上述相同的图:
But I get the same plot as above with the following warning
message:
Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default
推荐答案
geom_smooth()
适合一行,而stat_poly_eqn()
则发出错误. factor
是具有无序级别的分类变量.一个因素的趋势是不确定的. geom_smooth()
可能正在获取级别并将其转换为任意"级别.数字值,但这些值只是索引,而不是有意义的值.
geom_smooth()
fits a line, while stat_poly_eqn()
issues an error. A factor
is a categorical variable with unordered levels. A trend against a factor is undefined. geom_smooth()
may be taking the levels and converting them to "arbitrary" numerical values, but these values are just indexes rather than meaningful values.
要获得类似于问题中所描述内容的图,但使用提供正确线性回归线和相应 p 值的代码,我将使用以下代码.主要变化是将数值变量time
映射到x
,从而使回归拟合成为有效的运算.为了进行线性拟合,使用了带有log10变换的x刻度,并在有数据可用的年龄段处使用了中断和标签.
To obtain a plot similar to what is described in the question but using code that provides correct linear regression lines and the corresponding p-values I would use the code below. The main change is that the numerical variable time
is mapped to x
making the fitting of a regression a valid operation. To allow for a linear fit an x-scale with a log10 transformation is used, with breaks and labels at the ages for which data is available.
library(dplyr)
library(ggplot2)
library(ggpmisc)
set.seed(1)
df <-
data.frame(
value = c(
rnorm(500, 8, 1), rnorm(600, 6, 1.5), rnorm(400, 4, 0.5),
rnorm(500, 2, 2), rnorm(400, 4, 1), rnorm(600, 7, 0.5),
rnorm(500, 3, 1), rnorm(500, 3, 1), rnorm(500, 3, 1)
),
age = c(
rep("d3", 500), rep("d8", 600), rep("d24", 400),
rep("d3", 500), rep("d8", 400), rep("d24", 600),
rep("d3", 500), rep("d8", 500), rep("d24", 500)
),
group = c(rep("A", 1500), rep("B", 1500), rep("C", 1500))
) %>%
mutate(time = as.integer(gsub("d", "", age))) %>%
arrange(group, time) %>%
mutate(age = factor(age, levels = c("d3", "d8", "d24")),
group = factor(group))
my_formula = y ~ x
ggplot(df, aes(x = time, y = value)) +
geom_violin(aes(fill = age, color = age), alpha = 0.3) +
geom_boxplot(width = 0.1,
aes(color = age), fill = NA) +
geom_smooth(color = "black", formula = my_formula, method = 'lm') +
stat_poly_eq(aes(label = stat(p.value.label)),
formula = my_formula, parse = TRUE,
npcx = "center", npcy = "bottom") +
scale_x_log10(name = "Age", breaks = c(3, 8, 24)) +
facet_wrap(~group) +
theme_minimal()
哪个创建了下图:
这篇关于跨组添加趋势线并在分组的小提琴图或箱图中设置刻度标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!