在data.table中运行回归 [英] Run regression in data.table

查看:153
本文介绍了在data.table中运行回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅假数据集。

library(data.table)
library(MASS)
n=5000
DT = data.table(
      grp=1:n,
      name=as.character(as.hexmode(1:n)), 
      x= sample(c(1:400),n,replace = TRUE)
    )

setkey(DT,grp)

UIDlist <- unique(DT[,grp])
IDnamelist <- paste0("V", 1 : length(UIDlist), sep = "")
test <- DT[, (IDnamelist):=lapply(UIDlist,function(x) grp ==x)][, V5000:= NULL]



我有一个data.table, grp,Name,x,y。然后我在grp的每个级别添加dummy。
然后我需要使用glm.nb在MASS包中运行回归。

I have a data.table, in which there're 4 columns, "grp", "Name", "x", "y". And then I add dummy on each level in "grp". Then I need to run the regression using glm.nb in MASS package.

首先尝试这个

SumResult <- glm.nb(x ~ factor(uid), data = test) 

但是添加虚拟变量时,当grp中有N个级别时,我们添加N-1个假人。

But when adding dummies, we must notice that when there're N levels in "grp", we add N-1 dummies. So this method is not appropriate as far as I think.

所以我试过这样:

SumResult <- glm.nb( x ~ V1 + V2 + V3 + V4 + .....+ V4999  , data = test)

很难写出所有的V1,V2,... V4999来做回归。

It's stupid to write all of the V1, V2, ... V4999 to do the regression.

有没有代码可以达到目的?

Is there code can achieve the purpose?

感谢

推荐答案

可以尝试通过字符串操作创建公式对象

You can try to create your formula object by string manipulation

formula <- as.formula(paste0("x ~ ", paste(names(test)[-(1:3)], collapse = " + ")))
sumresult <- glm.nb(formula, data = test)

您也可以使用@BrandonBertelsen

You can also use the more readable code of @BrandonBertelsen

glm.nb(x ~ ., data = test[-c(1:3)])

这篇关于在data.table中运行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆