在一个数据帧中多次进行线性回归计算 [英] Linear Regression calculation several times in one dataframe

查看:137
本文介绍了在一个数据帧中多次进行线性回归计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R评估气候数据,并且我的数据集看起来像下面的小型版本...请原谅我的粗略发布礼节,希望这篇文章可以理解.

I am using R to evaluate climate data and I have a data set that looks like the following miniaturized version... please forgive my crude posting etiquette, I hope this post is understandable.

[0][STA.NAME] [YEAR] [SUM.CDD]  
1 NAME1 1967    760  
2 NAME1 1985    800  
3 NAME1 1996    740  
4 NAME1 2003    810  
5 NAME1 2011    790  
6 NAME2 1967    700  
7 NAME2 1985    690  
8 NAME2 1996    850  
9 NAME2 2003    790  
10 NAME3    1967    760  
11 NAME3    1985    800  
12 NAME3    1990    740  
13 NAME3    1996    810  
14 NAME3    2003    790  
15 NAME3    2011    800  

我正在尝试与此一起返回新的DF

I am trying to return a new DF with this

[STA.NAME] [Eq'n of trend]  
NAME1  (y = mx + b)  
NAME2  (y = mx + b)  

等...

最终,我将需要计算趋势的方差以及数据的总方差,并希望最终将其附加到此结果数据集中,以实现类似...

Eventually I will need to calculate variance of the trends, as well as total variance of data and would like to eventually append those to this resulting data set for something like...

[STA.NAME] [TREND] [VAR.TREND] [VAR.DATA]   
with values in rows, 1 for each STA.NAME...

任何帮助都将不胜感激,如果有比lm()更好的方法,而我目前对此感到困惑,那么我也会对此感兴趣.

Any help is greatly appreciated, If there is a better way than lm(), with which I am currently stumped, I would be interested as well.

非常感谢您,

杰西

推荐答案

以下是使用plyr中的ddply()返回每个组的系数的简单解决方案:

Here is a simple solution using ddply() from plyr to return the coefficients for each group:

首先复制数据:

x <- read.table(text="
STA.NAME YEAR SUM.CDD  
1 NAME1 1967    760  
2 NAME1 1985    800  
3 NAME1 1996    740  
4 NAME1 2003    810  
5 NAME1 2011    790  
6 NAME2 1967    700  
7 NAME2 1985    690  
8 NAME2 1996    850  
9 NAME2 2003    790  
10 NAME3    1967    760  
11 NAME3    1985    800  
12 NAME3    1990    740  
13 NAME3    1996    810  
14 NAME3    2003    790  
15 NAME3    2011    800  ", header=TRUE)

现在进行建模:

library(plyr)
ddply(x, .(STA.NAME), function(z)coef(lm(SUM.CDD ~ YEAR, data=z)))

  STA.NAME (Intercept)      YEAR
1    NAME1   -444.8361 0.6147541
2    NAME2  -6339.2047 3.5702200
3    NAME3   -995.2381 0.8928571


现在,根据您要执行的操作,创建数据的单个模型可能更简单(也许更有意义):


Now, depending on what you want to do, it may be simpler (and perhaps more meaningful) to create a single model of your data:

fit <- lm(SUM.CDD ~ YEAR + STA.NAME, data=x)

获取摘要:

summary(fit)

Call:
lm(formula = SUM.CDD ~ YEAR + STA.NAME, data = x)

Residuals:
   Min     1Q Median     3Q    Max 
-63.57 -22.21  10.72  18.62  80.72 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -2065.6401  1463.5353  -1.411   0.1858  
YEAR              1.4282     0.7345   1.945   0.0778 .
STA.NAMENAME2   -15.8586    27.5835  -0.575   0.5769  
STA.NAMENAME3     3.9046    24.7089   0.158   0.8773  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 40.8 on 11 degrees of freedom
Multiple R-squared: 0.3056, Adjusted R-squared: 0.1162 
F-statistic: 1.614 on 3 and 11 DF,  p-value: 0.2424 

仅提取系数:

coef(fit)
  (Intercept)          YEAR STA.NAMENAME2 STA.NAMENAME3 
 -2065.640078      1.428247    -15.858650      3.904632 


最后,您可能希望将模型与交互条件配合起来.该模型可为您有效地提供与原始plyr解决方案相同的结果.根据您的数据和目标,这可能是实现此目的的方法:


Finally, you perhaps wanted to fit a model with interaction terms. This model gives you effectively the same results as the original plyr solution. Depending on your data and your objectives, this might be the way to do it:

fit <- lm(SUM.CDD ~ YEAR * STA.NAME, data=x)
summary(fit)

Call:
lm(formula = SUM.CDD ~ YEAR * STA.NAME, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-57.682 -13.166  -1.012  23.006  63.046 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)
(Intercept)         -444.8361  2280.7464  -0.195    0.850
YEAR                   0.6148     1.1447   0.537    0.604
STA.NAMENAME2      -5894.3687  3661.9795  -1.610    0.142
STA.NAMENAME3       -550.4020  3221.8390  -0.171    0.868
YEAR:STA.NAMENAME2     2.9555     1.8406   1.606    0.143
YEAR:STA.NAMENAME3     0.2781     1.6172   0.172    0.867

Residual standard error: 39.17 on 9 degrees of freedom
Multiple R-squared: 0.4763, Adjusted R-squared: 0.1854 
F-statistic: 1.637 on 5 and 9 DF,  p-value: 0.2451 

这篇关于在一个数据帧中多次进行线性回归计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆