在R公式中,为什么必须对幂项使用I()函数,例如y〜I(x ^ 3) [英] In R formulas, why do I have to use the I() function on power terms, like y ~ I(x^3)

查看:159
本文介绍了在R公式中,为什么必须对幂项使用I()函数,例如y〜I(x ^ 3)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力了解波浪号运算符和相关函数的用法.我的第一个问题是,为什么需要使用I()来指定算术运算符?例如,这两个图产生不同的结果(前者为直线,而后者为预期曲线)

I'm trying to get my head around the use of the tilde operator, and associated functions. My 1st question is why does I() need to be used to specify arithmetic operators? For example, these 2 plots generate different results (the former having a straight line, and the latter the expected curve)

x <- c(1:100)
y <- seq(0.1,10,0.1)

plot(y~x^3)
plot(y~I(x^3))

此外,以下两个图也都产生了预期的结果

further, both of the following plots also generate the expected result

plot(x^3, y)
plot(I(x^3), y)

我的第二个问题是,也许我一直在使用的示例太简单了,但我不知道~应该在哪里实际使用.

My second question is, perhaps the examples I've been using are too simple, but I don't understand where ~ should actually be used.

推荐答案

这里的问题是如何解释公式.与数字矢量一起使用时,中缀运算符"+","*",:"和"^"具有完全不同的含义.在公式中,代字号将左侧和右侧分开.在公式中,^运算符用于构造相互作用,以便x = x^2 = x^3而不是可能的预期数学能力. (与自身交互的变量就是相同的变量.)如果键入(x+y)^2,则R解释器将产生(出于内部良好的用途),而不是数学的:x^2 +2xy +y^2,而是符号的:,其中x:y是一个交互项.

The issue here is how formulas are interpreted. The infix operators "+", "*", ":" and "^" have entirely different meanings than when used with numeric vectors. In a formula the tilde separates the left hand side from the right hand side. In formulas the ^ operator is for constructing interactions so that x = x^2 = x^3 rather than the perhaps expected mathematical power. (A variable interacting with itself is just the same variable.) If you had typed (x+y)^2 the R interpreter would have produced (for its own good internal use), not a mathematical: x^2 +2xy +y^2 , but rather a symbolic: x + y +x:y where x:y is an interaction term.

?formula

I()函数用于将参数转换为"as.is",即您所期望的.因此,I(x ^ 2)将返回一个升到第二次幂的值的向量.

The I() function acts to convert the argument to "as.is", i.e. what you expect. So I(x^2) would return a vector of values raised to the second power.

在回归函数中看到的~应该被认为是分布为"或依赖于".它暗示了模型描述中的错误术语,通常将其标记为(Intercept)",并且函数上下文和参数还可以进一步确定链接函数,例如log()或logit().

The ~ should be thought of as saying "is distributed as" or "is dependent on" when seen in regression functions. It implies an error term in model descriptions which will generally be labelled "(Intercept)" and the function context and arguments may also further determine a link function such as log() or logit().

公式中的"+"符号实际上并没有添加两个变量,而是通常隐式地请求在该变量的RHS上其余变量的上下文中计算该变量的回归系数.公式.回归函数使用`model.matrix,该函数将识别公式中因素或特征向量的存在,并建立一个矩阵来扩展公式中离散成分的水平.

The "+" symbol in a formula is not really adding two variables but is usually an implicit request to calculate a regression coefficient(s) for that variable in the context of the rest of the variables that are on the RHS of a formula. The regression functions use `model.matrix and that function will recognize the presence of factors or character vectors in the formula and build a matrix that expand the levels of the discrete components of the formula.

在plot()-ting函数中,它基本上颠倒了plot函数通常采用的通常( x, y )参数的顺序.编写了plot.formula方法,以便可以将公式用作与R通讯的更数学"方式.在graphics::plot.formulacurve和'lattice'和'ggplot'函数中,它决定了倍数因子或数值向量被显示并多面".

In plot()-ting functions it basically reverses the usual ( x, y ) order of arguments that the plot function usually takes. There was a plot.formula method written so that formulas could be used as a more "mathematical" mode of communicating with R. In the graphics::plot.formula, curve, and 'lattice' and 'ggplot' functions, it governs how multiple factors or numeric vectors are displayed and "facetted".

我后来了解到,~实际上是一个infix(或前缀)原始函数,它创建一个R调用",可以使用列表提取运算符对其进行访问.所有这些功能对典型用户都是隐藏的,但是它可以被更高级的函数作者使用.

I learned later that ~ is actually an infix (or prefix) primitive function that creates an R 'call' which can be accessed with list extraction operators. All of that is hidden from the typical user, but it can be a facility used by more advanced function authors.

"+"运算符的重载在下面的注释中进行了讨论,并在绘图包中完成:ggplot2和gridExtra,它在哪里分离传递对象结果的函数,因此它充当传递和分层的作用操作员.具有公式方法的聚合函数使用"+"作为排列"和分组运算符.

The overloading of the "+" operator is discussed in the comments below and is also done in the plotting packages: ggplot2 and gridExtra where is it separating functions that deliver object results, so it acting and as a pass-through and layering operator. The aggregation functions that have a formula method use "+" as an "arrangement" and grouping operator.

这篇关于在R公式中,为什么必须对幂项使用I()函数,例如y〜I(x ^ 3)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆