线性回归系数信息作为数据框或矩阵 [英] Linear regression coefficient information as Data Frame or Matrix
问题描述
我正在尝试创建一个脚本来优化线性回归分析,并且我真的很想对模型输出进行操作,尤其是Pr(> | t |)值.不幸的是,我不知道如何将模型输出输出到矩阵或数据表中.
I am trying to create a script to optimize a linear regression analysis, and I would really like to operate on the model output, most specifically the Pr(>|t|) value. Unfortunately, I do not know how to get the model output into a matrix or data table.
这里是一个示例:在下面的代码中,我创建了七列数据,并使用其他六列拟合了第七列.当我得到模型的摘要时,很明显三个参数比其他三个参数要重要得多.如果我可以通过数字方式访问系数输出,则可以创建一个脚本以删除最低有效参数并重新运行分析……不过,实际上,我是手动执行此操作.
Here is an example: In the code below, I create seven columns of data, and fit the seventh using the other six. When I get a summary of the model, it is clear that three of the parameters are much more significant than than the other three. If I had access to the coefficient output numerically, I could perhaps create a script to drop the least significant parameter and re-run the analysis... however as it is, I am doing this manually.
做到这一点的最佳方法是什么?
What is the best way to do this?
q = matrix(
c(2,14,-4,1,10,9,41,8,13,2,0,20,3,27,1,10,-1,0,
10,-6,23,6,13,-8,1,15,-7,55,7,14,10,0,20,-3,6,4,20,
-1,5,19,-2,48,10,19,8,8,10,-2,24,8,13,9,8,14,5,7,7,
12,1,0,16,7,27,7,10,-1,1,15,7,31,2,20,-5,10,12,3,57,
0,19,-8,8,11,-4,63,5,11,7,8,10,-7,6,9,10,-7,2,19,8,
51,2,18,3,3,14,4,30), nrow=15, ncol=7, byrow = TRUE)
#
colnames(q) <- c("A","B","C","D","E","F","Z")
#
q <- as.data.frame(q)
#
qmodel <- lm(Z~.,data=q)
#
summary(qmodel)
#
输出:
Call:
lm(formula = Z ~ ., data = q)
Residuals:
Min 1Q Median 3Q Max
-1.25098 -0.52655 -0.02931 0.62350 1.26649
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.09303 1.51627 -1.380 0.205
A 0.91161 0.11719 7.779 5.34e-05 ***
B 1.99503 0.09539 20.914 2.87e-08 ***
C -2.98252 0.04789 -62.283 4.91e-12 ***
D 0.13458 0.10377 1.297 0.231
E 0.15191 0.09397 1.617 0.145
F 0.01417 0.04716 0.300 0.772
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9439 on 8 degrees of freedom
Multiple R-squared: 0.9986, Adjusted R-squared: 0.9975
F-statistic: 928.9 on 6 and 8 DF, p-value: 6.317e-11
现在这是我想要看到的东西:
Now here is what I'd like to see:
> coeffs
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.09303 1.51627 -1.380 2.05e-01
A 0.91161 0.11719 7.779 5.34e-05
B 1.99503 0.09539 20.914 2.87e-08
C -2.98252 0.04789 -62.283 4.91e-12
D 0.13458 0.10377 1.297 2.31e-01
E 0.15191 0.09397 1.617 1.45e-01
F 0.01417 0.04716 0.300 7.72e-01
实际上,我是通过这种方式得到的...根本不是自动化的...
As it is, I got that in this manner... not automated at all...
coeffs = matrix(
c(-2.09303,1.51627,-1.38,0.205,0.91161,0.11719,
7.779,0.0000534,1.99503,0.09539,20.914,0.0000000287,
-2.98252,0.04789,-62.283,0.00000000000491,0.13458,
0.10377,1.297,0.231,0.15191,0.09397,1.617,0.145,
0.01417,0.04716,0.3,0.772), nrow=7, ncol=4, byrow = TRUE)
#
rownames(coeffs) <- c("(Intercept)","A","B","C","D","E","F")
colnames(coeffs) <- c("Estimate","Std. Error","t value","Pr(>|t|)")
#
coeffs <- as.data.frame(coeffs)
#
coeffs
推荐答案
您想要的是摘要对象的系数
组件.
What you want is the coefficients
component of the summary object.
m <- lm(Z~.,data=q)
summary(m)$coefficients
一些进一步的评论:
- 使用
step
进行逐步变量选择,而不是自己进行编码; - 逐步变量选择具有不良的统计特性;考虑使用
glmnet
之类的东西(在相同名称的软件包中)来进行正则化模型构建.
- Use
step
to do stepwise variable selection rather than coding it yourself; - Stepwise variable selection has bad statistical properties; consider something like
glmnet
(in the package of the same name) to do regularized model building instead.
这篇关于线性回归系数信息作为数据框或矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!