R模型矩阵中因子的所有层次 [英] All Levels of a Factor in a Model Matrix in R
问题描述
我有一个data.frame
,由数字和因子变量组成,如下所示.
I have a data.frame
consisting of numeric and factor variables as seen below.
testFrame <- data.frame(First=sample(1:10, 20, replace=T),
Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T),
Fourth=rep(c("Alice","Bob","Charlie","David"), 5),
Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4))
我想构建一个matrix
,该matrix
将虚拟变量分配给该因子,而只保留数字变量.
I want to build out a matrix
that assigns dummy variables to the factor and leaves the numeric variables alone.
model.matrix(~ First + Second + Third + Fourth + Fifth, data=testFrame)
如预期的那样,当运行lm
时,这会将每个因子的一个水平留为参考水平.但是,我想为所有因素的每个级别构建一个带有虚拟变量/指标变量的matrix
.我正在为glmnet
建立此矩阵,所以我不必担心多重共线性.
As expected when running lm
this leaves out one level of each factor as the reference level. However, I want to build out a matrix
with a dummy/indicator variable for every level of all the factors. I am building this matrix for glmnet
so I am not worried about multicollinearity.
有没有办法让model.matrix
为因子的每个级别创建虚拟对象?
Is there a way to have model.matrix
create the dummy for every level of the factor?
推荐答案
您需要为因子变量重置contrasts
:
You need to reset the contrasts
for the factor variables:
model.matrix(~ Fourth + Fifth, data=testFrame,
contrasts.arg=list(Fourth=contrasts(testFrame$Fourth, contrasts=F),
Fifth=contrasts(testFrame$Fifth, contrasts=F)))
或者,键入少一点,但没有正确的名称:
or, with a little less typing and without the proper names:
model.matrix(~ Fourth + Fifth, data=testFrame,
contrasts.arg=list(Fourth=diag(nlevels(testFrame$Fourth)),
Fifth=diag(nlevels(testFrame$Fifth))))
这篇关于R模型矩阵中因子的所有层次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!