如何绘制“之前和之后"图形?ggplot与连接线和子集一起使用的对策? [英] How to graph "before and after" measures using ggplot with connecting lines and subsets?

查看:48
本文介绍了如何绘制“之前和之后"图形?ggplot与连接线和子集一起使用的对策?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是ggplot的新手,对R相对较新,并且想制作一个粉碎的前后"散点图,并使用连接线来说明特殊培训计划前后不同子组的百分比运动.我尝试了一些选项,但尚未:

  • 分别显示每个观察值(现在相同的值是重叠的)
  • 用线将相关的前后测量值(x = 0和X = 1)连接起来,以更清楚地说明变化的方向
  • 使用形状和颜色沿类和id细分数据

如何最好地使用满足上述要求的ggplot(或其他)创建散点图?

主要替代方法:geom_point()

以下是一些使用genom_point的示例数据和示例代码

  x<-c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1)#0 =之前,1 =之后y<-c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95)#的百分比和平的感觉"类<-c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1)#0 =多个1天= 1天id<-c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6)#id = per个人df<-data.frame(x,y,class,id)ggplot(df,aes(x = x,y = y),fill = id,shape = class)+ geom_point() 

替代:scale_size()

我已经研究过stat_sum()来总结重叠观测的频率,但是由于重叠,因此无法使用颜色和形状进行子集化.

  ggplot(df,aes(x = x,y = y))+stat_sum() 

的示例图像

替代:geom_dotplot()

我还研究了geom_dotplot()来澄清因使用genom_point()而产生的重叠观察结果,就像我在下面的示例中所做的那样,但是我还没有了解如何将前后测量值合并到同一图中./p>

  df1<-df [1:10,]#之前的数据df2<-df [11:20,]#之后的数据p1<-ggplot(df1,aes(x = x,y = y))+geom_dotplot(binaxis ="y",stackdir ="center",stackratio = 2,binwidth =(1/0.3))p2<-ggplot(df2,aes(x = x,y = y))+geom_dotplot(binaxis ="y",stackdir ="center",stackratio = 2,binwidth =(1/0.3))grid.arrange(p1,p2,nrow = 1)#GridExtra包 

解决方案

或者最好通过 x id class

汇总数据code>为 y mean / median ,过滤掉产生 NA id s(例如 id s 3和6),并通过线连接点?因此,如果您真的不需要显示某些 id 的可变性(如果该图仅说明趋势,则可能是正确的),您可以这样做:

 库(ggplot)图书馆(dplyr)#library(ggthemes)df<-df%&%;%group_by(x,id,class)%>%总结(y =中位数(y,narm = T))%>%ungroup()%&%;%变异(id =因子(id),x = factor(x,labels = c("before","after")),class = factor(class,labels = c(一天",多天")),)%&%;%group_by(id)%&%;%mutate(nas = any(is.na(y)))%>%ungroup()%&%;%filter(!nas)%&%;%选择(-nas)ggplot(df,aes(x = x,y = y,col = id,group = id))+geom_point(aes(shape = class))+geom_line(show.legend = F)+#theme_few()+#theme(legend.position ="none")+ylab(感受和平,%")+xlab(") 

I’m totally new to ggplot, relatively fresh with R and want to make a smashing "before-and-after" scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:

  • show each individual observation separately (now same values are overlapping)
  • connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
  • subset the data along class and id using shape and colors

How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?

Main alternative: geom_point()

Here is some sample data and example code using genom_point

    x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
    y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of "feelings of peace"
    class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
    id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual

    df <- data.frame(x,y,class,id)

    ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()

Alternative: scale_size()

I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.

    ggplot(df, aes(x=x, y=y)) +
      stat_sum()

Alternative: geom_dotplot()

I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.

    df1 <- df[1:10,] # data before
    df2 <- df[11:20,] # data after

    p1 <- ggplot(df1, aes(x=x, y=y)) +
      geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
           binwidth=(1/0.3))

    p2 <- ggplot(df2, aes(x=x, y=y)) +
      geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
           binwidth=(1/0.3))

    grid.arrange(p1,p2, nrow=1) # GridExtra package

解决方案

Or maybe it is better to summarize data by x, id, class as mean/median of y, filter out ids producing NAs (e.g. ids 3 and 6), and connect the points by lines? So in case if you don't really need to show variability for some ids (which could be true if the plot only illustrates tendencies) you can do it this way:

library(ggplot)
library(dplyr)
#library(ggthemes)

df <- df %>%
  group_by(x, id, class) %>%
  summarize(y = median(y, na.rm = T)) %>%
  ungroup() %>%
  mutate(
    id = factor(id),
    x = factor(x, labels = c("before", "after")),
    class = factor(class, labels = c("one day", "multiple days")),
    ) %>%
  group_by(id) %>%
  mutate(nas = any(is.na(y))) %>%
  ungroup() %>%
  filter(!nas) %>%
  select(-nas)

ggplot(df, aes(x = x, y = y, col = id, group = id)) +
  geom_point(aes(shape = class)) +
  geom_line(show.legend = F) +
  #theme_few() +
  #theme(legend.position = "none") +
  ylab("Feelings of peace, %") +
  xlab("")

这篇关于如何绘制“之前和之后"图形?ggplot与连接线和子集一起使用的对策?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆