从 R 中的单个数据帧运行多个线性回归 [英] Running several linear regressions from a single dataframe in R

查看:23
本文介绍了从 R 中的单个数据帧运行多个线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 21 列的单个国家/地区的出口贸易数据集.第一列表示年份(1962-2014),而其他 20 个是贸易伙伴.我正在尝试对年列和其他列进行线性回归.我已经尝试过这里推荐的方法:运行多个简单的线性回归来自 R 中需要使用的数据框

I have a dataset of export trade data for a single country with 21 columns. The first column indicates the years (1962-2014) while the other 20 are trading partners. I am trying to run linear regressions for the years column and each other column. I have tried the method recommended here: Running multiple, simple linear regressions from dataframe in R that entails using

combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)

然而,这只会产生每对的截距,这对我来说不如回归的斜率重要.

However this only yields the intercept for each pair which is less important to me than the slope of the regressions.

此外,我尝试将我的数据集用作时间序列,但是当我尝试运行时

Additionally I have tried to use my dataset as a time series, however when I try to run

lm(dimnames~., brazilts, na.action=na.exclude)

(其中 brazilts 是我的数据集,作为从1962"到2014"的时间序列)它返回:

(where brazilts is my dataset as a time series from "1962" to "2014") it returns:

Error in model.frame.default(formula = dimnames ~ ., data = brazilts,  : 
  object is not a matrix.

因此,我对矩阵尝试了相同的方法,但它返回了错误:

I therefore tried the same method with a matrix but then it returned the error:

Error in model.frame.default(formula = . ~ YEAR, data = brazilmatrix,  : 
  'data' must be a data.frame, not a matrix or an array

(其中 brazilmatrix 是我作为 data.matrix 的数据集,其中包含一列多年).

(where brazilmatrix is my dataset as a data.matrix which includes a column for years).

真的,在这一点上,我什至不精通 R.最终目标是创建一个循环,我可以使用该循环对 28 个国家/地区每年按国家对的更大出口总额数据集进行回归.也许我以完全错误的方式攻击这一点,所以欢迎任何帮助或批评.请记住,这些年(1962-2014 年)实际上是我的解释变量,而总出口值是我的因变量,这可能会影响我在上述示例中的语法.提前致谢!

Really I am not even proficient in R and at this point. The ultimate goal is to create a loop that I can use to get take regressions for a significantly larger dataset of gross exports by country-pair per year for 28 countries. Perhaps I am attacking this in entirely the wrong way, so any help or criticism is welcome. Bare in mind that the years (1962-2014) are in effect my explanatory variable and the value of gross export is my dependent variable, which may be throwing off my syntax in the above examples. Thanks in advance!

推荐答案

只是为了添加一个替代方案,我建议走这条路:

Just to add an alternative, I would propose going down this route:

library(reshape2)
library(dplyr)
library(broom)

df <- melt(data.frame(x = 1962:2014, 
                      y1 = rnorm(53), 
                      y2 = rnorm(53), 
                      y3 = rnorm(53)), 
          id.vars = "x")

df %>% group_by(variable) %>% do(tidy(lm(value ~ x, data=.)))

在这里,我只是融合数据,以便所有相关的列都由行组给出,以便能够使用 dplyr 的分组操作.这给出了以下数据帧作为输出:

Here, I just melt the data so that all relevant columns are given by groups of rows, to be able to use dplyr's grouped actions. This gives the following dataframe as output:

Source: local data frame [6 x 6]
Groups: variable [3]

  variable        term     estimate    std.error  statistic   p.value
    (fctr)       (chr)        (dbl)        (dbl)      (dbl)     (dbl)
1       y1 (Intercept) -3.646666114 18.988154862 -0.1920495 0.8484661
2       y1           x  0.001891627  0.009551103  0.1980533 0.8437907
3       y2 (Intercept) -8.939784046 16.206935047 -0.5516024 0.5836297
4       y2           x  0.004545156  0.008152140  0.5575415 0.5795966
5       y3 (Intercept) 21.699503502 16.785586452  1.2927462 0.2019249
6       y3           x -0.010879271  0.008443204 -1.2885240 0.2033785

这是继续处理系数的一种非常方便的形式.所需要做的就是融合数据帧,使所有列都是数据集中的行,然后使用 dplyrgroup_by 对所有子集进行回归.broom::tidy 将回归输出放入一个不错的数据框中.有关详细信息,请参阅 ?broom.

This is a pretty convenient form to continue working with the coefficients. All that is required is to melt the dataframe so that all columns are rows in the dataset, and then to use dplyr's group_by to carry out the regression in all subsets. broom::tidy puts the regression output into a nice dataframe. See ?broom for more information.

如果您需要保留模型以进行某种调整(为 lm 对象实现),那么您还可以执行以下操作:

In case you need to keep the models to do adjustments of some sort (which are implemented for lm objects), then you can also do the following:

df %>% group_by(variable) %>% do(mod = lm(value ~ x, data=.))

Source: local data frame [3 x 2]
Groups: <by row>

# A tibble: 3 x 2
  variable      mod
*   <fctr>   <list>
1       y1 <S3: lm>
2       y2 <S3: lm>
3       y3 <S3: lm>

这里,对于每个变量,lm 对象存储在数据框中.因此,如果您想首先获得模型输出,您可以像访问任何普通数据框一样访问它,例如

Here, for each variable, the lm object is stored in the dataframe. So, if you want to get the model output for the first, you can just access it as you would access any normal dataframe, e.g.

tmp <- df %>% group_by(variable) %>% do(mod = lm(value ~ x, data=.))
tmp[tmp$variable == "y1",]$mod
[[1]]

Call:
lm(formula = value ~ x, data = .)

Coefficients:
(Intercept)            x  
  -1.807255     0.001019  

如果您想将某些方法应用于所有 lm 对象,这很方便,因为您可以使用 tmp$mod 为您提供它们的列表这一事实,这使得很容易传递给例如lapply.

This is convenient if you want to apply some methods to all lm objects since you can use the fact that tmp$mod gives you a list of them, which makes it easy to pass to e.g. lapply.

这篇关于从 R 中的单个数据帧运行多个线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆