R数据表循环子集by factor和do lm（） [英] R data.table loop subset by factor and do lm()

查看：183 发布时间：2017/3/12 11:07:55 r data.table lm

本文介绍了R数据表循环子集by factor和do lm（）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图创建一个函数，甚至只是工作如何运行一个循环使用data.table语法，其中我可以按照因子，在这种情况下的id变量子集表，然后在每个子集运行线性模型出结果。以下示例数据。

df < - data.frame（id = letters [1：3]， cyl = sample（c（a，b，c），30，replace = TRUE）， factor = sample（c（TRUE，FALSE），30，replace = TRUE）， hp = sample（c（20:50），30，replace = TRUE）） dt = as.data.table（df） fit< （hp〜cyl + factor，data = df）#how我得到[i]在这里工作子集和迭代每个因素，也做在data.table语法？
预期结果是sopmething，如fit [1] model，fit [2] model等。 p>

解决方案
我知道你想用数据表来做，如果你想要一些特定的方面， @ MartinBel的方法是一个好的。

另一方面，如果你想存储适合自己， lapply（...）更好的选项：
set.seed（1） df< - data.frame（id = letters [ 1：3]， cyl = sample（c（a，b，c），30，replace = TRUE）， factor = sample（c（TRUE，FALSE） 30，replace = TRUE）， hp = sample（c（20:50），30，replace = TRUE）） dt< - data.table（df，key =id） fit< - lapply（unique（df $ id）， function（z）lm（hp〜cyl + factor，data = dt [J（z），]，y = T ）＃系数 sapply（fits，coef）＃[，1] [，2] [，3] ＃（截取）44.117647 35.000000 3.933333e + 01 ＃cylb -6.117647 -6.321429 -1.266667e + 01 ＃cylc -13.176471 3.821429 -7.833333e + 00 ＃factorTRUE 1.176471 5.535714 2.325797e-15 ＃predict值 sapply（fit，predict）＃[，1] [，2] [，3] ＃1 45.29412 28.67857 26.66667 ＃2 32.11765 35.00000 31.50000 ＃3 30.94118 34.21429 26.66667 ＃... ＃residuals sapply（fits，residuals）＃[，1] [，2] ＃1 2.7058824 0.3214286 7.333333 ＃2 -2.1176471 5.0000000 -4.500000 ＃3 3.0588235 8.7857143 -4.666667 ＃... ＃se和r -sq sapply（fit，function（x）c（se = summary（x）$ sigma，rsq = summary（x）$ r.squared））＃[，1] [，3] ＃se 7.923655 8.6358196 6.4592741 ＃rsq 0.463076 0.3069017 0.4957024 ＃QQ绘图 par（mfrow = c（1，length（fits））） lapply（fit，plot，2）

注意使用 key =id code>在 data.table（...）的调用中，使用if dt [J（z）] 子集数据表。这真的是没有必要的，除非 dt 是巨大的。
I am trying to create a function or even just work out how to run a loop using data.table syntax where I can subset the table by factor, in this case the id variable, then run a linear model on each subset and out the results. Sample data below. df <- data.frame(id = letters[1:3], cyl = sample(c("a","b","c"), 30, replace = TRUE), factor = sample(c(TRUE, FALSE), 30, replace = TRUE), hp = sample(c(20:50), 30, replace = TRUE)) dt=as.data.table(df) fit <- lm(hp ~ cyl + factor, data = df) #how do I get the [i] to work here to subset and iterate by each factor and also do it in data.table syntax? Expected outcome is sopmething like fit[1] model, fit[2] model etc.. 解决方案 I know you want to do this with data tables, and if you want some specific aspect of the fit, like the coefficients, then @MartinBel's approach is a good one. On the other hand, if you want to store the fits themselves, lapply(...) might be a better option: set.seed(1) df <- data.frame(id = letters[1:3], cyl = sample(c("a","b","c"), 30, replace = TRUE), factor = sample(c(TRUE, FALSE), 30, replace = TRUE), hp = sample(c(20:50), 30, replace = TRUE)) dt <- data.table(df,key="id") fits <- lapply(unique(df$id), function(z)lm(hp~cyl+factor, data=dt[J(z),], y=T)) # coefficients sapply(fits,coef) # [,1] [,2] [,3] # (Intercept) 44.117647 35.000000 3.933333e+01 # cylb -6.117647 -6.321429 -1.266667e+01 # cylc -13.176471 3.821429 -7.833333e+00 # factorTRUE 1.176471 5.535714 2.325797e-15 # predicted values sapply(fits,predict) # [,1] [,2] [,3] # 1 45.29412 28.67857 26.66667 # 2 32.11765 35.00000 31.50000 # 3 30.94118 34.21429 26.66667 # ... # residuals sapply(fits,residuals) # [,1] [,2] [,3] # 1 2.7058824 0.3214286 7.333333 # 2 -2.1176471 5.0000000 -4.500000 # 3 3.0588235 8.7857143 -4.666667 # ... # se and r-sq sapply(fits, function(x)c(se=summary(x)$sigma, rsq=summary(x)$r.squared)) # [,1] [,2] [,3] # se 7.923655 8.6358196 6.4592741 # rsq 0.463076 0.3069017 0.4957024 # Q-Q plots par(mfrow=c(1,length(fits))) lapply(fits,plot,2) Note the use of key="id" in the call to data.table(...), and the use if dt[J(z)] to subset the data table. This really isn't necessary unless dt is enormous. 这篇关于R数据表循环子集by factor和do lm（）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R数据表循环子集by factor和do lm（） [英] R data.table loop subset by factor and do lm()

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R数据表循环子集by factor和do lm（） [英] R data.table loop subset by factor and do lm()

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭