关于使用 lm 的 R 线性回归建模中的 I( ) 项 [英] regarding the I( ) term in linear regression modeling in R using lm
问题描述
I once saw a linear model fitting written as follows:
lm(formula = Ozone ~ Solar.R + Wind + Temp + I(Wind^2) + I(Temp^2) +
I(Wind * Temp) + I(Wind * Temp^2) + I(Temp * Wind^2) + I(Temp^2 *
Wind^2), data = airquality)
I am not sure what does I( )
mean here? Or for example, what does I(Wind * Temp^2)
here. can I write it as Wind:Temp^2
?
The I()
notation in the formula syntax in R means 'as is' i.e. I(a+b)
simply means add the variable a+b as a predictor in the lm model. In your case I(Wind * Temp^2)
means include as a predictor variable the product of Wind and Temp squared. The I()
function is used so that there is no confusion with the operators of the formula syntax.
For more info page 2 here explains it in full detail.
Hope this is clear!
UPDATE I just want to add Hong Ooi's very good comment on this:
I(Wind * Temp^2)
is not the same as Wind:Temp^2
The ^n
operator in formula syntax means 'include these variables and all interactions up to n way'. For example Y ~ (X + Z + W)^2
is equivalent to Y ~ X + Z + W + X:Z + X:W + Z:W
So, in our case Wind:Temp^2
means just Wind:Temp
Small illustration:
Y <- runif(100)
X1 <- runif(100)
X2 <- runif(100)
df <- data.frame(Y,X1,X2)
> b <- lm( Y ~ X1:X2^2,data=df)
> summary(b)
Call:
lm(formula = Y ~ X1:X2^2, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.4802 -0.2490 -0.0173 0.2345 0.5066
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.45126 0.04794 9.413 2.28e-15 ***
X1:X2 0.08991 0.13414 0.670 0.504
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2965 on 98 degrees of freedom
Multiple R-squared: 0.004563, Adjusted R-squared: -0.005594
F-statistic: 0.4493 on 1 and 98 DF, p-value: 0.5043
这篇关于关于使用 lm 的 R 线性回归建模中的 I( ) 项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!