在R中动态创建公式? [英] Dynamic formula creation in R?

查看:72
本文介绍了在R中动态创建公式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将lm()函数与矩阵一起使用?也许正确的问题是:是否可以在R中动态创建公式?"

Is it at all possible to use the lm() function with a matrix? Or maybe, the correct question is: "Is it possible to dynamically create formulas in R?"

我正在创建一个函数,其输出为矩阵,矩阵中的列数不固定=它取决于用户的输入.我想使用矩阵中的数据拟合OLS模型. -第一栏代表因变量 -其他列是自变量.

I am creating a function whose output is a matrix and the number of columns in the matrix is not fixed = it depends on the inputs of the user. I want to fit an OLS model using the data in the matrix. - The first column represents the dependent variable - The other columns are the independent variables.

使用lm函数需要一个公式,该公式以知道解释变量的数量为前提,这不是我的情况!

Using the lm function requires a formula, which presupposes the knowledge of the number of explanatory variables, which is not my case!

除了使用OLS公式手动估算方程式之外,还有其他解决方案吗?

Is there any solution other than estimating the equation manually with the OLS formula?

可复制的示例:

> # When user 1 uses the function, he obtains m1
> m1 <- replicate(5, rnorm(50))
> colnames(m1) <- c("dep", paste0("ind", 1:(ncol(m1)-1)))
> head(m1)
            dep       ind1        ind2       ind3       ind4
[1,]  0.5848705  0.3602760 -0.95493403 -1.7278030 -0.1914170
[2,]  1.7167604 -0.1035825  0.31026183 -1.5071415 -1.2748600
[3,] -0.1326187 -0.5669026  0.01819749  0.8346880 -0.6304498
[4,] -0.7381232  0.4612792 -0.36132404 -0.1183131 -0.7446985
[5,]  0.9919123 -1.3228248 -0.44728270  0.6571244 -0.4895385
[6,] -0.8010111  0.8307584 -0.16106804  0.3069870 -0.3834583
> 
> # When user 2 uses the function, he obtains m2
> m2 <- replicate(6, rnorm(50))
> colnames(m2) <- c("dep", paste0("ind", 1:(ncol(m2)-1)))
> head(m2)
            dep       ind1       ind2         ind3       ind4       ind5
[1,]  1.2936031 -0.8060085  0.5020699 -1.699123234  1.0205626  1.0787888
[2,]  1.2357370  0.5973699 -1.2134283 -0.928040354 -0.3037920 -0.1251678
[3,]  0.5292583  0.1063213 -1.3036526  0.395886937 -0.1280863  1.1423532
[4,]  0.9234484 -0.4505604  1.2796922  0.424705893 -0.5547274 -0.3794037
[5,] -0.8016376  1.1362677 -1.1935238 -0.004460092 -1.4449704 -0.3739311
[6,]  0.4385867  0.5671138  0.4493617 -2.277925642 -0.8626944 -0.6880523

用户1将通过以下方式估算线性模型:

User 1 will estimate the linear model with:

lm(dep ~ ind1 + ind2 + ind3 + ind4, data = m1)

同时用户2具有一个额外的自变量,并且将通过以下方式估算线性模型:

Meanwhile user 2 has an extra independent variable and will estimate the linear model in the following way:

lm(dep ~ ind1 + ind2 + ind3 + ind4 + ind5, data = m1)

再一次,我有什么办法可以动态创建公式?

Once again, is there any way I can create the formula dynamically?

推荐答案

是的,实际上,列数越大,公式接口就会遇到性能问题. 因此,实际上,对于大列宽,首选是矩阵接口.

Yes, and in fact the formula interface has performance issues the larger the number of columns. So in fact the matrix interface is preferred for large column widths.

有什么办法可以动态创建公式?

Is there any way I can create the formula dynamically?

当然,您可以直接通过列索引的向量查找矩阵列,或者通过将名称向量转换为列索引来间接查找矩阵列 使用grep(cols_you_want, names(mat))

Sure, you look up the matrix columns either directly by an vector of column-indices, or indirectly by converting a vector of names into column-indices using grep(cols_you_want, names(mat))

但是在您的情况下,您不必担心grep,因为您已经有一个简单的列命名方案,您知道ind1...ind5对应于列索引1..5

But in your case, you don't need to bother with grep since you already have a straightforward column-naming scheme, you know that ind1...ind5 corresponds to column-indices 1..5

lm(m1[,'dep'] ~ m1[,2:5])

# or in general
lm(m1[,'dep'] ~ m1[,colIndicesVector])  # e.g. c(1,3,4)

这篇关于在R中动态创建公式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆