在R glm模型中选择具有统计意义的变量 [英] Selecting the statistically significant variables in an R glm model

查看:253
本文介绍了在R glm模型中选择具有统计意义的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个结果变量,例如Y,并且列出了100个可能影响Y的维度(例如X1 ... X100).

I have an outcome variable, say Y and a list of 100 dimensions that could affect Y (say X1...X100).

运行我的glm并查看我的模型的摘要后,我看到了那些具有统计意义的变量.我希望能够选择这些变量并运行另一个模型并比较性能.有什么方法可以解析模型摘要并仅选择有意义的模型?

After running my glm and viewing a summary of my model, I see those variables that are statistically significant. I would like to be able to select those variables and run another model and compare performance. Is there a way I can parse the model summary and select only the ones that are significant?

推荐答案

您可以通过函数"summary"访问glm结果的p值.系数矩阵的最后一列称为"Pr(> | t |)",其中包含模型中使用的因子的p值.

You can get access the pvalues of the glm result through the function "summary". The last column of the coefficients matrix is called "Pr(>|t|)" and holds the pvalues of the factors used in the model.

这是一个例子:

#x is a 10 x 3 matrix
x = matrix(rnorm(3*10), ncol=3)
y = rnorm(10)
res = glm(y~x)
#ignore the intercept pval
summary(res)$coeff[-1,4] < 0.05

这篇关于在R glm模型中选择具有统计意义的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆