如何在公式中按字符串使用参考变量? [英] How to use reference variables by character string in a formula?

查看:76
本文介绍了如何在公式中按字符串使用参考变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的最小示例中,我尝试在回归公式中使用字符串vars的值.但是,我只能将变量名的字符串("v2 + v3 + v4")传递给公式,而不是该字符串的真实含义(例如,"v2"是dat $ v2).

In the minimal example below, I am trying to use the values of a character string vars in a regression formula. However, I am only able to pass the string of variable names ("v2+v3+v4") to the formula, not the real meaning of this string (e.g., "v2" is dat$v2).

我知道有更好的方法来运行回归(例如lm(v1 ~ v2 + v3 + v4, data=dat)).我的情况更加复杂,我正在尝试弄清楚如何在公式中使用字符串.有什么想法吗?

I know there are better ways to run the regression (e.g., lm(v1 ~ v2 + v3 + v4, data=dat)). My situation is more complex, and I am trying to figure out how to use a character string in a formula. Any thoughts?

已更新以下代码

# minimal example 
# create data frame
v1 <- rnorm(10)
v2 <- sample(c(0,1), 10, replace=TRUE)
v3 <- rnorm(10)
v4 <- rnorm(10)
dat <- cbind(v1, v2, v3, v4)
dat <- as.data.frame(dat)

# create objects of column names
c.2 <- colnames(dat)[2]
c.3 <- colnames(dat)[3]
c.4 <- colnames(dat)[4]

# shortcut to get to the type of object my full code produces
vars <- paste(c.2, c.3, c.4, sep="+")

### TRYING TO SOLVE FROM THIS POINT:
print(vars)
# [1] "v2+v3+v4"

# use vars in regression
regression <- paste0("v1", " ~ ", vars)
m1 <- lm(as.formula(regression), data=dat)

更新: @Arun对于第一个示例中v1上缺少的"是正确的.这解决了我的示例,但是我的真实代码仍然有问题.在下面的代码块中,我调整了示例以更好地反映我的实际代码.一开始我以为问题是字符串vars.

Update: @Arun was correct about the missing "" on v1 in the first example. This fixed my example, but I was still having problems with my real code. In the code chunk below, I adapted my example to better reflect my actual code. I chose to create a simpler example at first thinking that the problem was the string vars.

这是一个不起作用的示例:)使用与上面创建的相同的数据框dat.

Here's an example that does not work :) Uses the same data frame dat created above.

dv <- colnames(dat)[1]
r2 <- colnames(dat)[2]
# the following loop creates objects r3, r4, r5, and r6
# r5 and r6 are interaction terms
for (v in 3:4) {
  r <- colnames(dat)[v]
  assign(paste("r",v,sep=""),r)
  r <- paste(colnames(dat)[2], colnames(dat)[v], sep="*")
  assign(paste("r",v+2,sep=""),r)
}

# combine r3, r4, r5, and r6 then collapse and remove trailing +
vars2 <- sapply(3:6, function(i) { 
                paste0("r", i, "+")
                })
vars2 <- paste(vars2, collapse = '')
vars2 <- substr(vars2, 1, nchar(vars2)-1)

# concatenate dv, r2 (as a factor), and vars into `eq`
eq <- paste0(dv, " ~ factor(",r2,") +", vars2)

问题出在这里:

print(eq)
# [1] "v1 ~ factor(v2) +r3+r4+r5+r6"

与第一个示例中的regression不同,eq不会引入列名(例如v3).保留对象名称(例如r3).因此,以下lm()命令不起作用.

Unlike regression in the first example, eq does not bring in the column names (e.g., v3). The object names (e.g., r3) are retained. As such, the following lm() command does not work.

m2 <- lm(as.formula(eq), data=dat)

推荐答案

我在这里看到了几个问题.首先,我认为这不会造成任何麻烦,但是让我们一步一步地制作数据框架,这样就不会在全局环境以及数据框架中都出现v1v4的浮动. .其次,让我们在这里将v2作为一个因素,这样我们以后就不必再将其作为一个因素了.

I see a couple issues going on here. First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have v1 through v4 floating around both in the global environment as well as in the data frame. Second, let's just make v2 a factor here so that we won't have to deal with making it a factor later.

dat <- data.frame(v1 = rnorm(10),
                  v2 = factor(sample(c(0,1), 10, replace=TRUE)),
                  v3 = rnorm(10),
                  v4 = rnorm(10) )

第一部分现在,对于您的第一部分来说,这就是您想要的:

Part One Now, for your first part, it looks like this is what you want:

lm(v1 ~ v2 + v3 + v4, data=dat)

这是一种更简单的方法,尽管您仍然必须指定响应变量.

Here's a simpler way to do that, though you still have to specify the response variable.

lm(v1 ~ ., data=dat)

或者,您当然可以使用粘贴构建函数并在其上调用lm.

Alternatively, you certainly can build up the function with paste and call lm on it.

f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)

但是,在这种情况下,我更喜欢使用do.call,它在将表达式传递给函数之前先对其求值;这使得生成的对象更适合于调用诸如update之类的函数.比较输出的call部分.

However, my preference in these situations is to use do.call, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like update on. Compare the call part of the output.

do.call("lm", list(as.formula(f), data=as.name("dat")))

第二部分关于第二部分,这就是您要的目标:

Part Two About your second part, it looks like this is what you're going for:

lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)

首先,由于v2是数据帧中的一个因素,因此我们不需要该部分,其次,可以通过更好地使用R的方法来使用算术运算来创建交互,从而进一步简化此过程. /p>

First, because v2 is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.

lm(v1 ~ v2*(v3 + v4), data=dat)

然后我只需使用paste创建函数;即使在较大的情况下,使用assign循环也可能不是一个好主意.

I'd then simply create the function using paste; the loop with assign, even in the larger case, is probably not a good idea.

f <- paste(names(dat)[1], "~", names(dat)[2], "* (", 
           paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"

然后可以直接使用lm或使用do.call调用它.

It can then be called using either lm directly or with do.call.

lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))

关于您的代码:尝试使用r3等时遇到的问题是,您需要变量r3的内容,而不是值r3.要获得该值,您需要像这样的get,然后将这些值与paste一起折叠.

About your code The problem you had with trying to use r3 etc was that you wanted the contents of the variable r3, not the value r3. To get the value, you need get, like this, and then you'd collapse the values together with paste.

vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")

但是,更好的方法是避免assign,而是像这样构建所需词条的向量.

However, a better way would be to avoid assign and just build a vector of the terms you want, like this.

vars <- NULL
for (v in 3:4) {
  vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2], 
                                          colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")

更像R的解决方案是使用lapply:

A more R-like solution would be to use lapply:

vars <- unlist(lapply(colnames(dat)[3:4], 
                      function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))

这篇关于如何在公式中按字符串使用参考变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆