循环线性回归并保存所有系数 [英] Loop linear regression and saving ALL coefficients

查看:203
本文介绍了循环线性回归并保存所有系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据下面的链接,我创建了一个代码,根据变量在数据子集上运行回归。

Based on the link below, I created a code to run regression on subsets of my data based on a variable.

执行线性回归和保存系数

在本例中创建一个DUMMY(0或1)创建子集(实际上我有3000个子集)

In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets)

res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){
  fit <- lm(y~x1 + x2, data=x)
  res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit))
  res
}))

这将导致以下数据集

                DUMMY   coeff

0.(Intercept)   0    22.8419956
0.x1            0   -11.5623064
0.x2            0     2.1006948
1.(Intercept)   1     4.2020874
1.x1            1    -0.4924303
1.x2            1     1.0917668

然而,我想要的是每回归一行,列中的变量。我还需要p值和标准错误。

What I would like however is one row per regression, and the variables in the columns. I also need the p values and standard errors included.

DUMMY   interceptx1   coeffx1   p-valuex1   SEx1   coeffx2  p-valuex2   SEx2
0          22.84       -11.56      0.04     0.15    2.10     0.80       0.90
1          4.20        -0.49       0.10     0.60    1.09     0.60       1.20  

任何想法如何做?

推荐答案

虽然你想要的输出是(IMHO)不是很整洁的数据,方法使用data.table和定制的提取功能。它可以选择返回广泛或较长的结果。

While your desired output is (IMHO) not really tidy data, here is an approach using data.table and a custom-built extraction-function. It has an option to return a wide or long form of the results.

提取器函数接收一个lm对象,并返回所有变量的估计值,p值和标准错误。

The extractor-function takes in a lm-object, and returns estimates, p-values and standard errors for all variables.

extractor <- function(model, return_wide = F){
  #get datatable with coefficient, se and p-value
  model_summary <- as.data.table(summary(model)$coefficients[,-3])
  model_summary[,variable:=names(coef(model))]
  #do some reshaping
  step2 <- melt(model_summary, id.var="variable",variable.name="measure")
  if(!return_wide){
    return(step2)
  }
  step3 <- dcast(step2, 1~variable+measure,value.var="value")
  return(step3)
}

演示:

res_wide <- dat[,extractor(lm(y~x1 + x2), return_wide = T), by = dummy]
> res_wide
# dummy . (Intercept)_Estimate (Intercept)_Std. Error (Intercept)_Pr(>|t|)  x1_Estimate x1_Std. Error x1_Pr(>|t|) x2_Estimate x2_Std. Error x2_Pr(>|t|)
# 1:     0 .           0.04314707             0.04495702            0.3376461 -0.054364406    0.04441204   0.2214895  0.01333804    0.04620999   0.7729757
# 2:     1 .          -0.04137086             0.04471550            0.3553164  0.009864255    0.04533808   0.8278539  0.05272257    0.04507189   0.2426726


res_long <-  dat[,extractor(lm(y~x1 + x2)), by = dummy]
# dummy    variable    measure        value
# 1:     0 (Intercept)   Estimate  0.043147072
# 2:     0          x1   Estimate -0.054364406
# 3:     0          x2   Estimate  0.013338043
# 4:     0 (Intercept) Std. Error  0.044957023
# 5:     0          x1 Std. Error  0.044412037
# 6:     0          x2 Std. Error  0.046209987
# 7:     0 (Intercept)   Pr(>|t|)  0.337646052
# 8:     0          x1   Pr(>|t|)  0.221489530

使用的数据:

library(data.table)
set.seed(123)
nobs = 1000
dat <- data.table(
  dummy = sample(0:1,nobs,T),
  x1 = rnorm(nobs),
  x2 = rnorm(nobs),
  y = rnorm(nobs))

这篇关于循环线性回归并保存所有系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆