R中的手动交互图线性回归 [英] Manual interaction plot linear regression in R

查看:147
本文介绍了R中的手动交互图线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用对数转换后的丰度数据(更好的拟合度)和其他一些变量来预测在不同月相下看到的动物的平均丰度(因子).最好的模型(最低的AIC)被证明包括相位和调查持续时间与云量的交互作用(都是连续的):

I am trying to predict the mean abundance of animals sighted during different moon phases (factor), using log-transformed abundance data (better fit) and some other variables. The best model (lowest AIC) turned out to include the interaction of phase and survey duration and the cloud cover (both continuous):

LMoon<-lm(ln~Phase*Duration+Clouds, data=abund)

summary(LMoon)

Call:
lm(formula = ln ~ Phase * Duration + Clouds, data = abund)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.75416 -0.46311  0.09522  0.46591  1.85978 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.382031   0.876865   0.436 0.664125    
Phase2            2.130065   1.226305   1.737 0.085851 .  
Phase3            1.971060   1.818542   1.084 0.281351    
Phase4            0.608043   1.140122   0.533 0.595146    
Phase5            4.786674   1.151850   4.156 7.44e-05 ***
Phase6            0.958706   1.046831   0.916 0.362238    
Phase7            0.254711   3.425214   0.074 0.940888    
Phase8            0.865995   1.043916   0.830 0.409005    
Duration          0.069153   0.035407   1.953 0.053952 .  
Clouds           -0.004259   0.002401  -1.774 0.079494 .  
Phase2:Duration  -0.087843   0.047818  -1.837 0.069545 .  
Phase3:Duration  -0.089908   0.069652  -1.291 0.200109    
Phase4:Duration  -0.005424   0.046675  -0.116 0.907749    
Phase5:Duration  -0.172016   0.049369  -3.484 0.000768 ***
Phase6:Duration  -0.035597   0.041435  -0.859 0.392583    
Phase7:Duration   0.024084   0.176773   0.136 0.891939    
Phase8:Duration  -0.033424   0.042064  -0.795 0.428963    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7721 on 89 degrees of freedom
Multiple R-squared:  0.3368,    Adjusted R-squared:  0.2176 
F-statistic: 2.825 on 16 and 89 DF,  p-value: 0.0009894

现在,由于这种相互作用,我需要绘制一个相互作用图(绘制lsmeans时CI太宽).我尝试使用这里提到的不同功能,但是没有一个起作用.显然,我需要手动计算和绘图,就像这样:

Now, due to this interaction, I need to make an interaction plot (CI are too wide when plotting the lsmeans). I've tried to use different functions mentioned out there but none of them worked. Apparently I need to calculate and plot manually which I did like this:

intercepts <- c(coef(LMoon)["(Intercept)"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase2"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase3"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase4"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase5"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase6"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase7"],
                coef(LMoon)["(Intercept)"] + coef(LMoon)["Phase8"])

lines.df <- data.frame(intercepts = intercepts,
                       slopes = c(coef(LMoon)["Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase2:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase3:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase4:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase5:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase6:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase7:Duration"],
                                  coef(LMoon)["Duration"]+coef(LMoon)["Phase8:Duration"]),
                       Phase2 = levels(abund$Phase))

qplot(x = Duration, y = Sp2, color = Phase, data = abund) + 
  geom_abline(aes(intercept = intercepts, 
                  slope = slopes, 
                  color = Phase), data = lines.df)

我得到的图是错误的,因为y值是在原始的真实比例上,但是这些线是基于使用对数转换数据的lm.

The plot that I get is wrong as the y-values are on the original, true scale, but the lines are based on the lm which uses the log-transformed data.

互动图的丰度,持续时间,月相

要对此进行反向转换,有人告诉我,实际上我最终不会得到直线.而不是使用abline(),我应该创建一组例如100 个新的 x 值涵盖持续时间数据的范围,并使用系数来计算您的预测 y 值.然后使用lines()绘制这些图形,它看起来应该像一条平滑的曲线.

To back-transform this someone told me that I won't get straight lines in the end, actually. Rather than using abline(), I should create a set of e.g. 100 new x values that cover the range of the duration data and use the coefficients to calculate your predicted y values. Then plot these using lines() and it should look like a smooth curve.

这就是我迷路的地方.

因此,我为调查持续时间的范围(最小15到最大45分钟)创建了一组新的x值: dur2<-seq(从= 15到= 45,length.out = 100)

So I created a the set of new x-values for the range of the survey duration (min 15 max 45 min): dur2 <- seq(from = 15, to = 45, length.out=100)

然后,一旦获得这些值,就应该使用我的LM的系数来获得每个x值的预测y值.之后,将y值反转换为原始比例.然后使用x值和反向转换的y值将线添加到绘图中.

Then once I got those values I am supposed to get the predicted y value for each x-value using the coefficients of my LM. After that, back-transforming the y-values to the original scale. And then using the x-values and the back-transformed y-values to add the lines to the plot.

我现在如何准确地获得预测值?我不能使用任何pred类型/函数,我已经尝试了全部.只是不适用于我的模型,所以手动是唯一的方法,但是我不知道如何...

How do I get the predicted values now exactly? I cannot use any pred type/function, I've tried it all. It just doesn't work with my model, so manual is the only way, but I have no clue how...

希望任何人都可以帮助我,到目前为止,我已经尝试了好几周,但绝望了,快要放弃了.

Hope anyone can help me with this, I've been trying for weeks and in despair by now, close to giving up.

干杯!

PS这里的数据:

> dput(subset(abund, Phase %in% c("Phase1", "Phase2")))

structure(list(Year = integer(0), Date = structure(integer(0), .Label = c("01/08/2009", 
"01/08/2016", "02/07/2019", "02/08/2009", "02/08/2012", "02/08/2016", 
"02/09/2007", "03/08/2007", "03/08/2009", "03/08/2014", "03/08/2015", 
"04/07/2019", "04/08/2009", "04/08/2013", "05/08/2009", "05/08/2014", 
"05/08/2015", "06/07/2008", "06/07/2019", "07/08/2009", "08/07/2010", 
"09/07/2010", "09/08/2015", "10/08/2009", "11/08/2009", "12/08/2009", 
"13/08/2009", "13/08/2014", "14/08/2009", "14/08/2012", "16/07/2006", 
"18/07/2009", "18/08/2015", "19/07/2011", "20/08/2009", "21/07/2011", 
"21/09/2009", "22/07/2011", "22/07/2016", "22/07/2017", "23/07/2007", 
"23/07/2016", "23/07/2017", "24/07/2017", "25/07/2007", "25/07/2010", 
"25/07/2017", "25/08/2016", "26/07/2010", "26/07/2011", "27/07/2006", 
"27/07/2011", "27/07/2012", "28/07/2016", "29/06/2019", "29/07/2005", 
"29/07/2009", "29/07/2010", "29/07/2016", "29/07/2019", "30/07/2005", 
"30/07/2007", "30/07/2016", "30/08/2005", "31/07/2005", "31/07/2009", 
"31/07/2014", "31/07/2016"), class = "factor"), NrSurvey = integer(0), 
    Duration = integer(0), Sp2 = integer(0), Phase = structure(integer(0), .Label = c("1", 
    "2", "3", "4", "5", "6", "7", "8"), class = "factor"), Clouds = integer(0), 
    Visibility = integer(0), ln = numeric(0)), row.names = integer(0), class = "data.frame")

推荐答案

使用 predict 获取预测值.不要手动计算.

Use predict to get predicted values. Don't calculate manually.

使用 expand.grid()生成一个数据框,其中包含您想要绘制的值的 dur2 序列和其他预测变量的所有组合.像这样:

Use expand.grid() to generate a data frame of all combinations of your dur2 sequence and the other predictors at the value(s) you want plot. Something like this:

prediction_data = expand.grid(
  Duration = dur2,
  Phase= unique(abund$Phase),
  Clouds = mean(abund$Clouds) # Hold Clouds constant at some value
)

# column names in prediction_data need to match those in the model formula
prediction_data$pred = predict(LMoon, newdata = prediction_data)
prediction_data$pred_orig = exp(prediction_data$pred)

# plot
ggplot(prediction_data, aes(x = Duration, y = pred_orig, color = Phase)) +
  geom_line() +
  geom_point(data = abund)

类似的事情应该起作用.

Something like that should work.

另一个不错的选择是使用 broom :: augment 生成预测.这样也可以轻松给出每个预测点的标准误差和残差.

Another nice option is to use broom::augment to generate the predictions. This can easily give standard errors and residuals for each prediction point as well.

library(broom)
prediction_data = augment(LMoon, newdata = prediction_data)

这篇关于R中的手动交互图线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆