R中所有可能的回归:将系数保存在矩阵中 [英] All possible Regression in R: Saving coefficients in a matrix
问题描述
我正在为系统发育广义线性模型的所有可能模型运行代码. 我遇到的问题是提取和保存每个模型的beta系数.
I am running code for all possible models of a phylogenetic generalised linear model. The issue I am having is extracting and saving the beta coefficients for each model.
我想将系数保存到矩阵中,其中列对应于特定变量,行对应于公式.之所以出现这个问题,是因为每个模型的变量都不同.因此,不能简单地将系数行绑定到矩阵.
I want to save the coefficients into a matrix, where the columns correspond to a specific variable and the rows correspond to a formula.The issue arises because the variables are different for every model. So one cannot simply row bind the coefficients to the matrix.
下面的示例显示了我在该问题上所处的位置:
The example below shows where I am up to in the problem:
y = rnorm(10)
inpdv = matrix(c(rnorm(10), runif(10), rpois(10, 1)), ncol = 3)
colnames(inpdv) = c("A", "B", "C")
data = cbind(y, inpdv)
model.mat = expand.grid(c(TRUE,FALSE), c(TRUE,FALSE), c(TRUE,FALSE))
names(model.mat) = colnames(inpdv)
formula = apply(model.mat, 1, function(x)
paste(colnames(model.mat)[x], collapse=" + "))
formula = paste("y", formula, sep = " ~ ")
formula[8] = paste(formula[8], 1, sep = "")
beta = matrix(NA, nrow = length(formula), ncol = 3)
for(i in 1:length(formula)){
fit = lm(formula(formula), data)
## Here I want to extract the beta coeffecients here into a n * k matrix
## However, I cannot find a way to assign the value to the right cell in the matrix
}
因此,我想每个系数都需要放入相应的单元格中,但我想不出一种快速有效的方法.
So I imagine each coefficient will need to be placed into the respective cell, but I cannot think of a quick and efficient way of doing so.
真正的分析将进行约30,000次,因此任何有关效率的提示也将不胜感激.
The true analysis will take place around 30, 000 times, so any tips on efficiency would also be appreciated.
因此,作为示例,y〜a + c模型的输出将需要采用
So as an example, the output for a model of y ~ a + c will need to be in the form of
a NA b
其中字母代表该模型的系数. 下一个模型可能是y〜b + c,然后将其添加到底部.所以结果现在看起来像
Where the letters represent the coefficient for that model. The next model may be y ~ b + c which would then be added in the bottom. So the result would now look like
a NA b
NA b c
推荐答案
如何使用names
和%in%
子集正确的列.使用coef
提取系数值.像这样:
How about using names
and %in%
to subset the right columns. Extract the coefficient values using coef
. Like this:
beta = matrix(NA, nrow = length(formula), ncol = 3)
colnames(beta) <- colnames(inpdv)
for(i in 1:length(formula)){
fit = lm(formula(formula[i]), data)
coefs <- coef(fit)
beta[ i , colnames(beta) %in% names( coefs ) ] <- coefs[ names( coefs ) %in% colnames( beta ) ]
}
# A B C
#[1,] -0.4229837 -0.0519900 0.3787666
#[2,] NA 0.7015679 0.0555350
#[3,] -0.4165834 NA 0.3692974
#[4,] NA NA 0.1346726
#[5,] -0.2035173 0.7049951 NA
#[6,] NA 0.7978726 NA
#[7,] -0.2229959 NA NA
#[8,] NA NA NA
我认为在这里使用for
循环是完全可以接受的.对于初学者,使用lapply
之类的东西有时会随着您运行越来越多的模拟而不断增加内存使用量.在lapply
循环完成之前,R有时不会将早期模型中的对象标记为垃圾,因此有时会出现内存分配错误.使用for
循环,我发现R会在必要时重用分配给循环的上一迭代的内存,因此,如果您可以运行一次循环,则可以运行多次.
I think it's perfectly acceptable to use a for
loop here. For starters using something like lapply
sometimes keep increasing memory usage as you run more and more of the simulations. R will sometimes not mark objects from earlier models as trash until the lapply
loop finishes so so can sometimes get a memory allocation error. Using the for
loop I find that R will reuse memory allocated to the previous iteration of the loop if necessary so if you can run the loop once, you can run it lots of times.
不使用for
循环的另一个原因是速度,但是我认为迭代时间与适合模型的时间相比可以忽略不计,因此我会使用它.
The other reason not to use a for
loop is speed, but I would assume that the time to iterate is negligible compared to the time to fit the model so I would use it.
这篇关于R中所有可能的回归:将系数保存在矩阵中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!