R--在大型数据集中重复线性回归 [英] R-- repeating linear regression in a large dataset

查看:68
本文介绍了R--在大型数据集中重复线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 新手,使用年度时间序列数据集(名为timeseries").该集合有一列表示年份,另外 600 列包含不同位置(L1"、L2"等)的年度值,例如类似于以下内容:

I'm an R newbie working with an annual time series dataset (named "timeseries"). The set has one column for year and another 600 columns with the yearly values for different locations ("L1," "L2", etc), e.g. similar to the following:

Year    L1     L2     L3    L4
1963   0.63   0.23   1.33  1.41
1964   1.15   0.68   0.21  0.4
1965   1.08   1.06   1.14  0.83
1966   1.69   1.85   1.3   0.76
1967   0.77   0.62   0.44  0.96

我想对每个站点进行线性回归,并且可以对单个站点使用以下内容:

I'd like to do a linear regression for each site and can use the following for a single site:

timeL1<-lm(L1~Year, data=timeseries)
summary(timeL1)

但我认为必须有一种方法可以为所有位置自动重复此操作.理想情况下,我希望最终得到两个结果向量——一个是所有位置的系数,另一个是所有位置的 p 值.通过一些搜索,我认为 plyr 包可能有效,但我无法弄清楚.我仍在学习 R 的基础知识,所以任何建议将不胜感激.

But I think there must be a way to automatically repeat this for all the locations. Ideally, I'd like to end up with two vectors of results-- one with the coefficients for all the locations and one with the p-values for all the locations. From some searching, I thought the plyr package might work, but I can't figure it out. I'm still learning the basics of R, so any suggestions would be appreciated.

推荐答案

你可以用一行代码做到这一点:

You can do this with one line of code:

apply(df[-1], 2, function(x) summary(lm(x ~ df$Year))$coef[1,c(1,4)])
                   L1           L2          L3          L4
Estimate -160.0660000 -382.2870000 136.4690000 106.9820000
Pr(>|t|)    0.6069965    0.3886881   0.7340981   0.7030296

这篇关于R--在大型数据集中重复线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆