从geom_smooth()中提取多个趋势线的斜率 [英] Extract slope of multiple trend lines from geom_smooth()

查看:135
本文介绍了从geom_smooth()中提取多个趋势线的斜率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用ggplot在一个时间序列中绘制多个趋势线(每十年).

I am trying to plot multiple trend lines (every ten years) in a time series using ggplot.

以下是数据:

dat <- structure(list(YY = 1961:2010, a = c(98L, 76L, 83L, 89L, 120L, 
107L, 83L, 83L, 92L, 104L, 98L, 91L, 81L, 69L, 86L, 76L, 85L, 
86L, 70L, 81L, 77L, 89L, 60L, 80L, 94L, 66L, 77L, 85L, 77L, 80L, 
79L, 79L, 65L, 70L, 80L, 87L, 84L, 67L, 106L, 129L, 95L, 79L, 
67L, 105L, 118L, 85L, 86L, 103L, 97L, 106L)), .Names = c("YY", 
"a"), row.names = c(NA, -50L), class = "data.frame")

这是脚本:

p <- ggplot(dat, aes(x = YY))
p <- p + geom_line(aes(y=a),colour="blue",lwd=1)
p <- p + geom_point(aes(y=a),colour="blue",size=2)

p <- p + theme(panel.background=element_rect(fill="white"),
         plot.margin = unit(c(0.5,0.5,0.5,0.5),"cm"),
         panel.border=element_rect(colour="black",fill=NA,size=1),
         axis.line.x=element_line(colour="black"),
         axis.line.y=element_line(colour="black"),
         axis.text=element_text(size=15,colour="black",family="serif"),
         axis.title=element_text(size=15,colour="black",family="serif"),
         legend.position = "top")

p <- p + scale_x_discrete(limits = c(seq(1961,2010,5)),expand=c(0,0))

p <- p + geom_smooth(data=dat[1:10,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[11:20,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[21:30,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[31:40,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + geom_smooth(data=dat[41:50,],aes(x=YY,y=a),method="lm",se=FALSE,color="black",formula=y~x,linetype="dashed")

p <- p + labs(x="Year",y="Number of Days")
outImg <- paste0("test",".png")
ggsave(outImg,p,width=8,height=5)

这是生成的图像:

我要/问题

  1. 我想提取斜率并将其添加到图中的趋势线上.如何从geom_smooth()中提取每条线的斜率?

  1. I want to extract the slope and add them on the the trend lines in the figure. How can I extract the slope of each line from the geom_smooth()?

当前,我正在逐一绘制趋势线.我想知道是否存在一种可调整时间窗口的有效方法.例如,假设我要绘制每5年的趋势线.在上图中的时间窗口为10.

Currently, I am plotting the trend lines one by one. I want to know if there is an efficient way of doing this with adjustable time window. Suppose for example, I want to plot the trend lines for every 5 years. In the figure above the time window is 10.

假设我只想绘制显着的趋势线(即p值<0.05,null:没有趋势或斜率等于0),是否可以使用geom_smooth()来实现?

Suppose, I only want to plot the significant trend lines (i.e., p-value < 0.05, null: no trend or slope equals 0), is it possible to implement this with geom_smooth()?

我将不胜感激.

推荐答案

因此,在将数据通过管道传输到ggplot2之前,这些任务中的每一个都得到了最好的处理,但是使用tidyverse的其他一些软件包,它们都变得相当容易.

So, each of these tasks are best handled before you pipe your data into ggplot2, but they are all made fairly easy using some of the other packages from the tidyverse.

从问题1和问题2开始:

Beginning with questions 1 and 2:

虽然ggplot2可以绘制回归线,但要提取估计的斜率系数,您需要显式使用lm()对象.使用group_by()mutate(),您可以添加分组变量(例如,我的下面的代码仅对5年组进行此操作),然后仅将斜率估计值提取并提取到现有数据框中的列中.然后,可以使用geom_text()调用将这些斜率估计值绘制在ggplot中.我在下面以一种快速而肮脏的方式(将每个标签放置在它们回归的x和y值的平均值处)进行了此操作,但是您可以在数据框中指定它们的确切位置.

While ggplot2 can plot the regression line, to extract the estimated slope coefficients you need to work with the lm() object explicitly. Using group_by() and mutate(), you can add a grouping variable (my code below does this for 5 year groups just for example) and then calculate and extract just the slope estimate into columns in your existing data frame. Then those slope estimates can be plotted in ggplot using the geom_text() call. I've done this below in a quick and dirty way (placing each label at the mean of the x and y values they regress) but you can specify their exact placement in your dataframe.

分组变量和数据准备也使问题2变得轻而易举:现在,您已经在数据框中明确地有了分组变量,因此无需一一画出,geom_smooth()接受了group的美感.

Grouping variables and data prep makes question 2 a breeze too: now that you have the grouping variables explicitly in your dataframe there is no need to plot one by one, geom_smooth() accepts the group aesthetic.

此外,要回答问题3,您可以从lm对象的摘要中提取pvalue并仅过滤出对您关心的级别有意义的那些pvalue.如果将现在完整的数据框传递给geom_smooth()geom_text(),您将获得所需的绘图!

Additionally, to answer question 3, you can extract the pvalue from the summary of your lm objects and filter out only those that are significant to the level you care about. If you pass this now complete dataframe to geom_smooth() and geom_text() you will get the plot you're looking for!

library(tidyverse)

 # set up our base plot
 p <- ggplot(dat, aes(x = YY, y = a)) +
  geom_line(colour = "blue", lwd = 1) +
  geom_point(colour = "blue", size = 2) +
  theme(
    panel.background = element_rect(fill = "white"),
    plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
    panel.border = element_rect(colour = "black", fill = NA, size = 1),
    axis.line.x = element_line(colour = "black"),
    axis.line.y = element_line(colour = "black"),
    axis.text = element_text(size = 15, colour = "black", family = "serif"),
    axis.title = element_text(size = 15, colour = "black", family = "serif"),
    legend.position = "top"
  ) +
  scale_x_discrete(limits = c(seq(1961, 2010, 5)), expand = c(0, 0))

# add a grouping variable (or many!)
 prep5 <- dat %>%
  mutate(group5 = rep(1:10, each = 5)) %>%
  group_by(group5) %>%
  mutate(
    slope = round(lm(YY ~ a)$coefficients[2], 2),
    significance = summary(lm(YY ~ a))$coefficients[2, 4],
    x = mean(YY),   # x coordinate for slope label
    y = mean(a)     # y coordinate for slope label
  ) %>%
  filter(significance < .2)   # only keep those with a pvalue < .2 

p + geom_smooth(
  data = prep5, aes(x = YY, y = a, group = group5),  # grouping variable does the plots for us!
  method = "lm", se = FALSE, color = "black",
  formula = y ~ x, linetype = "dashed"
) +
  geom_text(
    data = prep5, aes(x = x, y = y, label = slope),
    nudge_y = 12, nudge_x = -1
  )

现在,在指定文本标签的位置时,您可能要比我在这里要小心一些.我使用Mean和geom_text()nudge_*自变量做一个简单的示例,但请记住,由于这些值已显式映射到x和y坐标,因此您可以完全控制!

Now you may want to be a little more careful about specifying the location of your text labels than I have been here. I used means and the nudge_* arguments of geom_text() to do a quick example but keep in mind since these values are mapped explicitly to x and y coordinates, you have complete control!

reprex创建于2018-07-16 包(v0.2.0).

Created on 2018-07-16 by the reprex package (v0.2.0).

这篇关于从geom_smooth()中提取多个趋势线的斜率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆