跨组添加趋势线并在分组的小提琴图或箱图中设置刻度标签 [英] Adding trend lines across groups and setting tick labels in a grouped violin plot or box plot

查看:292
本文介绍了跨组添加趋势线并在分组的小提琴图或箱图中设置刻度标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Rggplot2 geom_violin添加回归趋势线来绘制我正在绘制的xy分组数据:

I have xy grouped data that I'm plotting using R's ggplot2 geom_violin adding regression trend lines:

以下是数据:

library(dplyr)
library(plotly)
library(ggplot2)

set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
                 age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
                 group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
  dplyr::mutate(time = as.integer(age)) %>%
  dplyr::arrange(group,time) %>%
  dplyr::mutate(group_age=paste0(group,"_",age))

df$group_age <- factor(df$group_age,levels=unique(df$group_age))

我当前的情节:

ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) + 
  geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal()

我的问题是:

  1. 如何摆脱legendalpha部分?
  2. 我希望x-axis ticksdf$group而不是df$group_age,这意味着在group中心的每个group中每个tick,其中标签为group .考虑并非所有group都具有所有age的情况-例如,如果某个group仅具有两个age,并且我很确定ggplot仅会显示这两个age,我希望tick仍位于两个age之间.
  1. How do I get rid of the alpha part of the legend?
  2. I would like the x-axis ticks to be df$group rather than df$group_age, which means a tick per each group at the center of that group where the label is group. Consider a situation where not all groups have all ages - for example, if a certain group has only two of the ages and I'm pretty sure ggplot will only present only these two ages, I'd like the tick to still be centered between their two ages.

另一个问题:

在每个group的顶部绘制每个拟合斜率的p值也很好.

It would also be nice to have the p-values of each fitted slope plotted on top of each group.

我尝试过:

library(ggpmisc)
my.formula <- value ~ group_age
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) + 
  geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal() +
  stat_poly_eq(formula = my.formula,aes(label=stat(p.value.label)),parse=T)

但是通过以下warning消息,我得到了与上述相同的图:

But I get the same plot as above with the following warning message:

Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default 

推荐答案

geom_smooth()适合一行,而stat_poly_eqn()则发出错误. factor是具有无序级别的分类变量.一个因素的趋势是不确定的. geom_smooth()可能正在获取级别并将其转换为任意"级别.数字值,但这些值只是索引,而不是有意义的值.

geom_smooth() fits a line, while stat_poly_eqn() issues an error. A factor is a categorical variable with unordered levels. A trend against a factor is undefined. geom_smooth() may be taking the levels and converting them to "arbitrary" numerical values, but these values are just indexes rather than meaningful values.

要获得类似于问题中所描述内容的图,但使用提供正确线性回归线和相应 p 值的代码,我将使用以下代码.主要变化是将数值变量time映射到x,从而使回归拟合成为有效的运算.为了进行线性拟合,使用了带有log10变换的x刻度,并在有数据可用的年龄段处使用了中断和标签.

To obtain a plot similar to what is described in the question but using code that provides correct linear regression lines and the corresponding p-values I would use the code below. The main change is that the numerical variable time is mapped to x making the fitting of a regression a valid operation. To allow for a linear fit an x-scale with a log10 transformation is used, with breaks and labels at the ages for which data is available.

library(dplyr)
library(ggplot2)
library(ggpmisc)

set.seed(1)
df <-
  data.frame(
    value = c(
      rnorm(500, 8, 1), rnorm(600, 6, 1.5), rnorm(400, 4, 0.5),
      rnorm(500, 2, 2), rnorm(400, 4, 1), rnorm(600, 7, 0.5),
      rnorm(500, 3, 1), rnorm(500, 3, 1), rnorm(500, 3, 1)
    ),
    age = c(
      rep("d3", 500), rep("d8", 600), rep("d24", 400),
      rep("d3", 500), rep("d8", 400), rep("d24", 600),
      rep("d3", 500), rep("d8", 500), rep("d24", 500)
    ),
    group = c(rep("A", 1500), rep("B", 1500), rep("C", 1500))
  ) %>%
  mutate(time = as.integer(gsub("d", "", age))) %>%
  arrange(group, time) %>%
  mutate(age = factor(age, levels = c("d3", "d8", "d24")),
         group = factor(group))

my_formula = y ~ x

ggplot(df, aes(x = time, y = value)) +
  geom_violin(aes(fill = age, color = age), alpha = 0.3) + 
  geom_boxplot(width = 0.1,
               aes(color = age), fill = NA) +
  geom_smooth(color = "black", formula = my_formula, method = 'lm') + 
  stat_poly_eq(aes(label = stat(p.value.label)), 
               formula = my_formula, parse = TRUE,
               npcx = "center", npcy = "bottom") +
  scale_x_log10(name = "Age", breaks = c(3, 8, 24)) +
  facet_wrap(~group) +
  theme_minimal()

哪个创建了下图:

这篇关于跨组添加趋势线并在分组的小提琴图或箱图中设置刻度标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆