大写字母"I"是什么意思在R线性回归公式中是什么意思? [英] What does the capital letter "I" in R linear regression formula mean?
问题描述
我一直无法找到这个问题的答案,这在很大程度上是因为搜索带有独立字母(例如"I")的任何内容都会导致问题.
I haven't been able to find an answer to this question, largely because googling anything with a standalone letter (like "I") causes issues.
"I"在这样的模型中做什么?
What does the "I" do in a model like this?
data(rock)
lm(area~I(peri - mean(peri)), data = rock)
考虑到以下内容不起作用:
Considering that the following does NOT work:
lm(area ~ (peri - mean(peri)), data = rock)
,并且 this 确实有效:
rock$peri - mean(rock$peri)
任何有关如何自己研究的关键词也将非常有帮助.
Any key words on how to research this myself would also be very helpful.
推荐答案
I
隔离或隔离从R公式解析的角度来看I( ... )
的内容代码.如果您在公式之外使用标准R运算符,则它可以像使用标准R运算符一样工作,而不是被视为特殊的公式运算符.
I
isolates or insulates the contents of I( ... )
from the gaze of R's formula parsing code. It allows the standard R operators to work as they would if you used them outside of a formula, rather than being treated as special formula operators.
例如:
y ~ x + x^2
对R而言,意思是给我:
would, to R, mean "give me:
-
x
=x
的主要作用,并且 -
x^2
=x
, 的主要作用和二阶相互作用
x
= the main effect ofx
, andx^2
= the main effect and the second order interaction ofx
",
不是预期的x
加x
平方:
> model.frame( y ~ x + x^2, data = data.frame(x = rnorm(5), y = rnorm(5)))
y x
1 -1.4355144 -1.85374045
2 0.3620872 -0.07794607
3 -1.7590868 0.96856634
4 -0.3245440 0.18492596
5 -0.6515630 -1.37994358
这是因为^
是公式中的特殊运算符,如?formula
中所述.您最终只在模型框架中包括了x
,这是因为x
的主要作用已经包含在公式中的x
项中,并且没有任何东西可以与x
交叉以获得二阶相互作用.在x^2
术语中.
This is because ^
is a special operator in a formula, as described in ?formula
. You end up only including x
in the model frame because the main effect of x
is already included from the x
term in the formula, and there is nothing to cross x
with to get the second-order interactions in the x^2
term.
要获取常规运算符,您需要使用I()
将调用与公式代码隔离:
To get the usual operator, you need to use I()
to isolate the call from the formula code:
> model.frame( y ~ x + I(x^2), data = data.frame(x = rnorm(5), y = rnorm(5)))
y x I(x^2)
1 -0.02881534 1.0865514 1.180593....
2 0.23252515 -0.7625449 0.581474....
3 -0.30120868 -0.8286625 0.686681....
4 -0.67761458 0.8344739 0.696346....
5 0.65522764 -0.9676520 0.936350....
(最后一列是正确的,因为它属于AsIs
类,所以看起来很奇怪.)
(that last column is correct, it just looks odd because it is of class AsIs
.)
在您的示例中,-
在公式中使用时会指示从模型中删除项,您希望在其中-
拥有减法的通常的二元运算符含义:
In your example, -
when used in a formula would indicate removal of a term from the model, where you wanted -
to have it's usual binary operator meaning of subtraction:
> model.frame( y ~ x - mean(x), data = data.frame(x = rnorm(5), y = rnorm(5)))
Error in model.frame.default(y ~ x - mean(x), data = data.frame(x = rnorm(5), :
variable lengths differ (found for 'mean(x)')
这失败的原因是mean(x)
是长度为1的向量,而model.frame()
正确地告诉您这与其他变量的长度不匹配.解决方法是I()
:
This fails for reason that mean(x)
is a length 1 vector and model.frame()
quite rightly tells you this doesn't match the length of the other variables. A way round this is I()
:
> model.frame( y ~ I(x - mean(x)), data = data.frame(x = rnorm(5), y = rnorm(5)))
y I(x - mean(x))
1 1.1727063 1.142200....
2 -1.4798270 -0.66914....
3 -0.4303878 -0.28716....
4 -1.0516386 0.542774....
5 1.5225863 -0.72865....
因此,要在公式中使用具有特殊含义的运算符,但需要其 non-formula 含义,则需要将操作的元素包装在I( )
中.
Hence, where you want to use an operator that has special meaning in a formula, but you need its non-formula meaning, you need to wrap the elements of the operation in I( )
.
有关特殊运算符的更多信息,请参见?formula
,有关数据帧内函数本身 及其其他主要用例的更多详细信息,请参见?I
(在AsIs
如果您有兴趣的话,请发给您.
Read ?formula
for more on the special operators, and ?I
for more details on the function itself and its other main use-case within data frames (which is where the AsIs
bit originates from, if you are interested).
这篇关于大写字母"I"是什么意思在R线性回归公式中是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!