绘制“回归线"来自 R 中的多元回归 [英] Plot "regression line" from multiple regression in R

查看:82
本文介绍了绘制“回归线"来自 R 中的多元回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用多个连续预测变量进行了多元回归,其中一些结果显着,我想针对 一个预测变量创建我的 DV 的散点图或散点图,包括回归线".我该怎么做?

I ran a multiple regression with several continuous predictors, a few of which came out significant, and I'd like to create a scatterplot or scatter-like plot of my DV against one of the predictors, including a "regression line". How can I do this?

我的情节是这样的

D = my.data; plot( D$probCategorySame, D$posttestScore )

如果是简单的回归,我可以添加这样的回归线:

If it were simple regression, I could add a regression line like this:

lmSimple <- lm( posttestScore ~ probCategorySame, data=D )
abline( lmSimple ) 

但我的实际模型是这样的:

But my actual model is like this:

lmMultiple <- lm( posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data=D )

我想添加一条回归线,以反映实际模型的系数和截距,而不是简化的模型.我想我很乐意假设所有其他预测变量的平均值来做到这一点,尽管我已经准备好听取相反的建议.

I would like to add a regression line that reflects the coefficient and intercept from the actual model instead of the simplified one. I think I'd be happy to assume mean values for all other predictors in order to do this, although I'm ready to hear advice to the contrary.

这可能没什么区别,但我会提到以防万一,由于我可能不想绘制原始数据,因此情况稍微复杂一些.相反,我想为预测变量的分箱值绘制 DV 的平均值,如下所示:

This might make no difference, but I'll mention just in case, the situation is complicated slightly by the fact that I probably will not want to plot the original data. Instead, I'd like to plot mean values of the DV for binned values of the predictor, like so:

D[,'probCSBinned'] = cut( my.data$probCategorySame, as.numeric( seq( 0,1,0.04 ) ), include.lowest=TRUE, right=FALSE, labels=FALSE )
D = aggregate( posttestScore~probCSBinned, data=D, FUN=mean )
plot( D$probCSBinned, D$posttestScore )

只是因为当我这样做时,我的数据看起来更干净了.

Just because it happens to look much cleaner for my data when I do it this way.

推荐答案

您需要在绘图域中创建一个 x 值向量,并根据您的模型预测它们对应的 y 值.为此,您需要将此向量注入由与模型中的变量匹配的变量组成的数据帧中.你说你可以将其他变量保持在它们的平均值,所以我在我的解决方案中使用了这种方法.考虑到图中的其他值,您预测的 x 值是否实际上 合法 可能应该是您在设置时考虑的事情.

You need to create a vector of x-values in the domain of your plot and predict their corresponding y-values from your model. To do this, you need to inject this vector into a dataframe comprised of variables that match those in your model. You stated that you are OK with keeping the other variables fixed at their mean values, so I have used that approach in my solution. Whether or not the x-values you are predicting are actually legal given the other values in your plot should probably be something you consider when setting this up.

如果没有示例数据,我无法确定这是否适合您,因此如果下面有任何错误,我深表歉意,但这至少应该说明该方法.

Without sample data I can't be sure this will work exactly for you, so I apologize if there are any bugs below, but this should at least illustrate the approach.

# Setup
xmin = 0; xmax=10 # domain of your plot
D = my.data
plot( D$probCategorySame, D$posttestScore, xlim=c(xmin,xmax) )
lmMultiple <- lm( posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data=D )

# create a dummy dataframe where all variables = their mean value for each record
# except the variable we want to plot, which will vary incrementally over the 
# domain of the plot. We need this object to get the predicted values we
# want to plot.
N=1e4
means = colMeans(D)
dummyDF = t(as.data.frame(means))
for(i in 2:N){dummyDF=rbind(dummyDF,means)} # There's probably a more elegant way to do this.
xv=seq(xmin,xmax, length.out=N)
dummyDF$probCSBinned = xv 
# if this gives you a warning about "Coercing LHS to list," use bracket syntax:
#dummyDF[,k] = xv # where k is the column index of the variable `posttestScore`

# Getting and plotting predictions over our dummy data.
yv=predict(lmMultiple, newdata=subset(dummyDF, select=c(-posttestScore)))
lines(xv, yv)

这篇关于绘制“回归线"来自 R 中的多元回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆