通过有效的P值对变量进行分组 [英] Subset variables by significant P value
问题描述
我正在尝试按有效的P值对变量进行子集化,并尝试了以下代码,但它仅选择所有变量,而不是按条件选择.有人可以帮我解决问题吗?
I'm trying to subset variables by significant P-values, and I attempted with the following code, but it only selects all variables instead of selecting by condition. Could anyone help me to correct the problem?
myvars <- names(summary(backward_lm)$coefficients[,4] < 0.05)
happiness_reduced <- happiness_nomis[myvars]
谢谢!
推荐答案
使用 broom
包替代Martin的出色回答(在评论部分).不幸的是,您尚未发布数据,因此我将 mtcars
数据集用作演示:
An alternative solution to Martin's great answer (in the comments section) using the broom
package. Unfortunately, you haven't posted an data, so I'm using the mtcars
dataset as a demo:
library(broom)
# build model
m = lm(disp ~ ., data = mtcars)
# create a dataframe frm model's output
tm = tidy(m)
# visualise dataframe of the model
# (using non scientific notation of numbers)
options(scipen = 999)
tm
# term estimate std.error statistic p.value
# 1 (Intercept) -5.8119829 228.0609389 -0.02548434 0.97990925639
# 2 mpg 1.9398052 2.5976340 0.74675849 0.46348865035
# 3 cyl 15.3889587 12.1518291 1.26639032 0.21924091701
# 4 hp 0.6649525 0.2259928 2.94236093 0.00777972543
# 5 drat 8.8116809 19.7390767 0.44640796 0.65987184728
# 6 wt 86.7111730 16.1127236 5.38153418 0.00002448671
# 7 qsec -12.9742622 8.6227190 -1.50466021 0.14730421493
# 8 vs -12.1152075 25.2579953 -0.47965832 0.63642812949
# 9 am -7.9135864 25.6183932 -0.30890253 0.76043942893
# 10 gear 5.1265224 18.0578153 0.28389494 0.77927112074
# 11 carb -30.1067073 7.5513212 -3.98694566 0.00067029676
# get variables with p value less than 0.05
tm$term[tm$p.value < 0.05]
# [1] "hp" "wt" "carb"
主要优点是,通过获取模型的输出作为数据框,您可以使用变量名(而不是变量位置和行名)来操纵数据.
The main advantage is that by obtaining the model's output as a dataframe you can use variable names, and not variable positions and row names, to manipulate the data.
我正在使用 options(scipen = 999)
,以便更轻松地检查过滤是否有效(即,不在数据框中使用科学的数字符号).
I'm using options(scipen = 999)
to make it easier to check that filtering works (i.e. not using the scientific notation of numbers in the dataframe).
这篇关于通过有效的P值对变量进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!