如何从R中的lm命令提取表格汇总数据 [英] How to extract tabular summary data from an lm command in R
问题描述
我的数据结构如下:
group_id,months_from_start,perc_total_downloads,experience_ratio
$ p $大约有300个组,每组有大约70个连续的数据元素。
1 1 1.2 4
1 2 1.7 6
...
235 1 6.7 3
235 2 18 8
...
我发布了以下脚本来估计每个组的二阶多项式。
s.1 <-lm(xts(s [s $ group_id == 1,] [, - 2],order .by = as.Date(s [s $ group_id == 1,] [,2]))$ perc_total_downloads〜poly(xts(s [s $ group_id == 1,] [, - 2],order.by = as.Date(s [s $ group_id == 1,] [,2]))$ months_from_start,2,raw = TRUE))
s.235< -lm(xts(s [s $ group_id == 235,] [, - 2],order.by = as.Date(s [s $ group_id == 235,] [,2]))$ perc_total_downloads〜poly(xts(s [s $ group_id == 235,] [,-2],order.by = as.Date(s [s $ group_id == 235,] [,2]))$ months_from_start,2,raw = TRUE))
s.599 <-lm (xts(s [s $ group_id == 599,] [, - 2],order.by = as.Date(s [s $ group_id == 599,] [,2]))$ perc_total_downloads〜poly(xts( s $ s $ group_id == 599,] [, - 2],order.by = as.Date(s [s $ group_id == 599,] [,2]))$ months_from_start,2,raw = TRUE))
s.1111< -lm(xts(s [s $ group_id == 1111,] [, - 2],order.by = as.Date(s [s $ group_id == 1111,] [,2 ]))$ perc_total_downloads〜poly(xts(s [s $ group_id == 1111,] [, - 2],order.by = as.Date(s [s $ group_id == 1111,] [,2])) $ months_from_start,2,raw = TRUE))
s.1537< -lm(xts(s [s $ group_id == 1537,] [, - 2] ,order.by = as.Date(s [s $ group_id == 1537,] [,2]))$ perc_total_downloads〜poly(xts(s [s $ group_id == 1537,] [, - 2],order。 by = as.Date(s [s $ group_id == 1537,] [,2]))$ months_from_start,2,raw = TRUE))
对于这些新变量中的每一个,我都可以发布摘要声明来显示有趣的信息:
>摘要(s.44375)
通话:
lm(公式= xts(s [s $ group_id == 44375,] [,-2],order.by = as.Date s [s $ group_id ==
44375,] [,2]))$ perc_total_downloads〜poly(xts(s [s $ group_id ==
44375,] [,-2],order.by = as.Date(s [s $ group_id == 44375,
] [,2]))$ months_from_start,2,raw = TRUE))
残差:
分钟1Q中位数3Q最大
-0.0064004 -0.0017315 -0.0002022 0.0012087 0.0078436
系数:(3,由于奇点而未定义)
估计值Std。误差t值Pr(> | t |)
(截距)1.993e-03 1.137e-03 1.753 0.084。
poly(xts(s [s $ group_id == 44375,] [,-2],order.by = as.Date(s [s $ group_id == 44375,] [,2]))$ months_from_start ,2,raw = TRUE)1.0 7.769e-04 6.707e-05 11.583 <2e-16 ***
poly(xts(s [s $ group_id == 44375,] [,-2],order .by = as.Date(s [s $ group_id == 44375,] [,2]))$ months_from_start,2,raw = TRUE)2.0 -9.258e-06 8.404e-07 -11.017< 2e-16 * **
poly(xts(s [s $ group_id == 44375,] [,-2],order.by = as.Date(s [s $ group_id == 44375,] [,2])) $ months_from_start,2,raw = TRUE)0.1 NA NA NA
poly(xts(s [s $ group_id == 44375,] [,-2],order.by = as.Date(s [s $ group_id == 44375,] [,2]))$ months_from_start,2,raw = TRUE)1.1不适用不适用不适用
poly(xts(s [s $ group_id == 44375,] [,-2], order.by = as.Date(s [s $ group_id == 44375,] [,2]))$ months_from_start,2,raw = TRUE)0.2 NA NA NA
---
Signif 。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
剩余标准误差:0.002866在69自由度
多重R平方:0.6619,调整R平方:0.6521
F-统计:在2和69DF上为67.53,P-值:< 2.2e-16
为了达到我的目的,我需要将这些信息转录成一张表格,这个表格令人难以置信从这种格式中剪切和粘贴需要花费时间:
group_id intercept est截取stnd err拦截t值...
44375 1.993 e-03 1 / 137e-03 1.753 ...
...
方便我有传统的记法,而不是科学的记谱法,但我想我可以没有这样的生活。
有没有办法让我做这个没有切割和粘贴这个简单的函数只是一个简单的函数,而不是一个简单的函数,而不是一个简单的函数。返回一个R列表。例如,
R> x = runif(10); y = runif(10)
R> m = lm(y〜x)
您感兴趣的部分是第四个元素: p>
R>总结(m)[[4]]
估计标准错误t值Pr(> | t |)
(截距)0.44041 0.1768 2.4911 0.03746
x -0.05899 0.3143 -0.1877 0.85579
这只是一个矩阵。
以上回答你的问题,但你的代码让我想哭!特别是阅读
for
循环和plyr
包。例如,我怀疑最后两行几乎是你想要的一切:##加载包并创建一些数据
library(plyr)
dd = data.frame(group_id = sample(1:3,10,TRUE),x = runif(10),y = runif(10))
##通过group_id分解dd并做一些回归
dd1 = ddply(dd,。(group_id),summary,summary(lm(y〜x))[[4]])
##标注列名
colnames(dd1)[2:5] = c(估计标准差t值Pr(> | t |))
I have data structured the following way:
group_id, months_from_start, perc_total_downloads, experience_ratio 1 1 1.2 4 1 2 1.7 6 … 235 1 6.7 3 235 2 18 8 …
There are about 300 groups, each of which have 70 or so consecutive data elements.
I've issued the following script to estimate a second order polynomial for each of the groups.
s.1<-lm(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1,][,-2], order.by=as.Date(s[s$group_id == 1,][,2]))$months_from_start, 2, raw=TRUE)) s.235<-lm(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 235,][,-2], order.by=as.Date(s[s$group_id == 235,][,2]))$months_from_start, 2, raw=TRUE)) s.599<-lm(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 599,][,-2], order.by=as.Date(s[s$group_id == 599,][,2]))$months_from_start, 2, raw=TRUE)) s.1111<-lm(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1111,][,-2], order.by=as.Date(s[s$group_id == 1111,][,2]))$months_from_start, 2, raw=TRUE)) s.1537<-lm(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 1537,][,-2], order.by=as.Date(s[s$group_id == 1537,][,2]))$months_from_start, 2, raw=TRUE))
For each one of these new variables I can issue a summary statement to reveal interesting information:
> summary(s.44375) Call: lm(formula = xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$perc_total_downloads ~ poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)) Residuals: Min 1Q Median 3Q Max -0.0064004 -0.0017315 -0.0002022 0.0012087 0.0078436 Coefficients: (3 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 1.993e-03 1.137e-03 1.753 0.084 . poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.0 7.769e-04 6.707e-05 11.583 <2e-16 *** poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)2.0 -9.258e-06 8.404e-07 -11.017 <2e-16 *** poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.1 NA NA NA NA poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)1.1 NA NA NA NA poly(xts(s[s$group_id == 44375, ][, -2], order.by = as.Date(s[s$group_id == 44375, ][, 2]))$months_from_start, 2, raw = TRUE)0.2 NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.002866 on 69 degrees of freedom Multiple R-squared: 0.6619,Adjusted R-squared: 0.6521 F-statistic: 67.53 on 2 and 69 DF, p-value: < 2.2e-16
For my purpose I need to transcribe this information into a table, which is incredibly tedious and time consuming cutting and pasting from this format:
group_id intercept est intercept stnd err intercept t value … 44375 1.993e-03 1/137e-03 1.753 ... …
It would also be convenient for me to have conventional notation rather than scientific notation, but I imagine I could live without that.
Is there any way for me to do this without cutting and pasting by hand?
Thanks --sw
解决方案The summary function just returns an R list. For example,
R> x = runif(10);y=runif(10) R> m = lm(y ~ x)
The part you are interested in is the fourth element:
R> summary(m)[[4]] Estimate Std. Error t value Pr(>|t|) (Intercept) 0.44041 0.1768 2.4911 0.03746 x -0.05899 0.3143 -0.1877 0.85579
This is just a matrix.
The above answers your question, but you code makes me want to weep! In particular, Read up on
for
loops and theplyr
package. For example, I suspect the final two lines pretty much does everything you want:##Load the package and create some data library(plyr) dd = data.frame(group_id = sample(1:3, 10, TRUE), x = runif(10), y=runif(10)) ##Split up dd by group_id and do some regression dd1 = ddply(dd, .(group_id), summarise, summary(lm(y ~ x))[[4]]) ##Label the column names colnames(dd1)[2:5] = c("Estimate" "Std. Error" "t value" "Pr(>|t|)")
这篇关于如何从R中的lm命令提取表格汇总数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!