data.table vs plyr 回归输出 [英] data.table vs plyr regression output

查看:15
本文介绍了data.table vs plyr 回归输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table 包在速度方面非常有帮助.但我在实际使用线性回归的输出时遇到了麻烦.有没有一种简单的方法可以让 data.table 输出与 plyr 包中的输出一样漂亮/有用?下面是一个例子.谢谢!

The data.table package is very helpful in terms of speed. But I am having trouble actually using the output from a linear regression. Is there an easy way to get the data.table output to be as pretty/useful as that from the plyr package? Below is an example. Thank you!

library('data.table');
library('plyr');

REG <- data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)), y=rnorm(15), x=rnorm(15), z=rnorm(15));
REG;

ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x)));

REG[, coef(lm(y ~ x + z)), by=ID];

data.table 系数估计值在单个列中输出,而 plyr/ddply 系数估计值在多个且标记良好的列中输出.

The data.table coefficient estimates are output in a single column whereas the plyr/ddply coefficient estimates are output in multiple and nicely labeled columns.

我知道我可以使用 data.table 运行回归三次,但这似乎效率很低.不过,我可能是错的.

I know I can run the regression three times with data.table but that seems really inefficient. I could be wrong, though.

REG[, Intercept=coef(lm(y ~ x + z))[1],
      x        =coef(lm(y ~ x + z))[2],
      z        =coef(lm(y ~ x + z))[3], by=ID];

推荐答案

试试这个:

> REG[, as.list(coef(lm(y ~ x + z))), by=ID];
        ID (Intercept)           x         z
[1,] Frank  -0.2928611  0.07215896  1.835106
[2,]  Tony   0.9120795 -1.11153056  2.041260
[3,]    Ed   1.0498359  5.77131778 -1.253741

我有一种烦人的感觉,这个问题是在不到一周前被问到的,但我认为我在尝试时没有想到这种方法,而且我不记得有任何答案是这样紧凑的.

I have the nagging feeling that this question was asked less than a week ago, but I don't think I arrived at this approach when I tried it and I don't remember than any answer was this compact.

哦,它是 .. 在 r-help 上.如果他愿意,马修可以评论这件事的正当性.我想信息是返回列表的函数不会删除维度.有趣的是使用 list(coef(lm(...)) 并没有以我们希望的方式成功.

Oh, there it is .. on r-help. Matthew can comment on the rightfulness of this if he wants. I guess the message is that functions returning lists will not have dimensions dropped. The interesting thing was the using list(coef(lm(...)) did not succeed in the manner we hoped.

这篇关于data.table vs plyr 回归输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆