将系数名称转换为R中的公式 [英] Converting coefficient names to a formula in R

查看:80
本文介绍了将系数名称转换为R中的公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用具有因子的公式时,拟合模型将系数命名为XY,其中X是因子的名称,而Y是因子的特定级别.我希望能够根据这些系数的名称创建一个公式.

When using formulas that have factors, the fitted models name the coefficients XY, where X is the name of the factor and Y is a particular level of it. I want to be able to create a formula from the names of these coefficients.

原因:如果我将套索拟合到稀疏设计矩阵(如下所述),我想创建一个仅包含非零系数项的新公式对象.

The reason: If I fit a lasso to a sparse design matrix (as I do below) I would like to create a new formula object that only contains terms for the nonzero coefficients.

require("MatrixModels")
require("glmnet")
set.seed(1)
n <- 200
Z <- data.frame(letter=factor(sample(letters,n,replace=T),letters),
                x=sample(1:20,200,replace=T))
f <- ~ letter + x:letter + I(x>5):letter
X <- sparse.model.matrix(f, Z)
beta <- matrix(rnorm(dim(X)[2],0,5),dim(X)[2],1)
y <- X %*% beta + rnorm(n)

myfit <- glmnet(X,as.vector(y),lambda=.05)
fnew <- rownames(myfit$beta)[which(myfit$beta != 0)]
 [1] "letterb"              "letterc"              "lettere"             
 [4] "letterf"              "letterg"              "letterh"             
 [7] "letterj"              "letterm"              "lettern"             
[10] "lettero"              "letterp"              "letterr"             
[13] "letters"              "lettert"              "letteru"             
[16] "letterw"              "lettery"              "letterz"             
[19] "lettera:x"            "letterb:x"            "letterc:x"           
[22] "letterd:x"            "lettere:x"            "letterf:x"           
[25] "letterg:x"            "letterh:x"            "letteri:x"           
[28] "letterj:x"            "letterk:x"            "letterl:x"           
[31] "letterm:x"            "lettern:x"            "lettero:x"           
[34] "letterp:x"            "letterq:x"            "letterr:x"           
[37] "letters:x"            "lettert:x"            "letteru:x"           
[40] "letterv:x"            "letterw:x"            "letterx:x"           
[43] "lettery:x"            "letterz:x"            "letterb:I(x > 5)TRUE"
[46] "letterc:I(x > 5)TRUE" "letterd:I(x > 5)TRUE" "lettere:I(x > 5)TRUE"
[49] "letteri:I(x > 5)TRUE" "letterj:I(x > 5)TRUE" "letterl:I(x > 5)TRUE"
[52] "letterm:I(x > 5)TRUE" "letterp:I(x > 5)TRUE" "letterq:I(x > 5)TRUE"
[55] "letterr:I(x > 5)TRUE" "letteru:I(x > 5)TRUE" "letterv:I(x > 5)TRUE"
[58] "letterx:I(x > 5)TRUE" "lettery:I(x > 5)TRUE" "letterz:I(x > 5)TRUE"

由此,我想要一个公式

~ I(letter=="d") + I(letter=="e") + ...(etc)

我检出了Formula()和all.vars()无济于事.此外,由于可能会出现不同类型的术语,因此编写一个函数来解析它会有些麻烦.例如,对于x:letter,当x是一个数字值而字母是一个因子,或者I(x> 5):letter是另一个令人讨厌的情况.

I checked out formula() and all.vars() to no avail. Also, writing a function to parse this is a bit of a pain because of the different types of terms that can arise. For example, for x:letter when x is a numeric value and letter is a factor, or I(x>5):letter as another annoying case.

所以我是否不知道某些在公式及其字符表示之间转换并再次返回的函数?

So am I not aware of some function to convert between formula and its character representation and back again?

推荐答案

运行代码时,由于没有指定set.seed(),我得到了一些不同.我没有使用变量名称"letter",而是使用"letter_"作为方便的拆分标记:

When I ran the code, I got something a bit different, since set.seed() had not been specified. Instead of using the variable name "letter", I used "letter_" as a convenient splitting marker:

> fnew <- rownames(myfit$beta)[which(myfit$beta != 0)]

> fnew
 [1] "letter_c" "letter_d" "letter_e" "letter_f" "letter_h" "letter_k" "letter_l"
 [8] "letter_o" "letter_q" "letter_r" "letter_s" "letter_t" "letter_u" "letter_v"
[15] "letter_w"

然后进行拆分并打包为字符矩阵:

Then made the split and packaged into a character matrix:

> fnewmtx <- cbind( lapply(sapply(fnew, strsplit, split="_"), "[[", 2),
+ lapply(sapply(fnew, strsplit, split="_"), "[[", 1))

fnewmtx [,1] [,2]
letter_c"c"字母" letter_d"d"字母" letter_e"e"字母" letter_f"f"字母"剪掉了其余部分

fnewmtx [,1] [,2]
letter_c "c" "letter" letter_d "d" "letter" letter_e "e" "letter" letter_f "f" "letter" snipped the rest

并将粘贴函数的输出包装为as.formula(),这是如何在公式及其字符表示形式之间进行转换以及返回的答案"的一半.另一半是as.character()

And wrapped the paste function(s) output in as.formula() which is half of the answer to how to "convert between formula and its character representation and back." The other half is as.character()

form <- as.formula( paste("~", 
             paste( 
               paste(" I(", fnewmtx[,2], "_ ==", "'",fnewmtx[,1],"') ", sep="") , 
             sep="", collapse="+")
                 ) 
           )  # edit: needed to add back the underscore

现在输出是一个适当的类对象:

And the output is now an appropriate class object:

> class(form)
[1] "formula"
> form
~I(letter_ == "c") + I(letter_ == "d") + I(letter_ == "e") + 
    I(letter_ == "f") + I(letter_ == "h") + I(letter_ == "k") + 
    I(letter_ == "l") + I(letter_ == "o") + I(letter_ == "q") + 
    I(letter_ == "r") + I(letter_ == "s") + I(letter_ == "t") + 
    I(letter_ == "u") + I(letter_ == "v") + I(letter_ == "w")

我发现有趣的是,as.formula转换使字母周围的单引号变为双引号.

I find it interesting that the as.formula conversion made the single-quotes around the letters into double-quotes.

既然问题有一个或多个附加维度,我的建议是跳过公式的重新创建.请注意,myfit $ beta的行名与X的列名完全相同,因此请使用非零行名作为索引来选择X矩阵中的列:

Now that the problem has an additional dimension or two, my suggestion is to skip the recreation of the formula. Note that the rownames of myfit$beta are exactly the same as the column names of X, so instead use the non-zero rownames as indices to select columns in the X matrix:

> str(X[ , which( colnames(X) %in% rownames(myfit$beta)[which(myfit$beta != 0)] )] )
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:429] 9 54 91 157 166 37 55 68 117 131 ...
  ..@ p       : int [1:61] 0 5 13 20 28 36 42 50 60 68 ...
  ..@ Dim     : int [1:2] 200 60
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:200] "1" "2" "3" "4" ...
  .. ..$ : chr [1:60] "letter_b" "letter_c" "letter_e" "letter_f" ...
  ..@ x       : num [1:429] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ factors : list()

> myfit2 <- glmnet(X[ , which( colnames(X) %in% rownames(myfit$beta)[which(myfit$beta != 0)] )] ,as.vector(y),lambda=.05)
> myfit2

Call:  glmnet(x = X[, which(colnames(X) %in% rownames(myfit$beta)[
                                           which(myfit$beta != 0)])], 
              y = as.vector(y), lambda = 0.05) 

     Df   %Dev Lambda
[1,] 60 0.9996   0.05

这篇关于将系数名称转换为R中的公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆