在一个数据帧中多次进行线性回归计算 [英] Linear Regression calculation several times in one dataframe
问题描述
我正在使用R评估气候数据,并且我的数据集看起来像下面的小型版本...请原谅我的粗略发布礼节,希望这篇文章可以理解.
I am using R to evaluate climate data and I have a data set that looks like the following miniaturized version... please forgive my crude posting etiquette, I hope this post is understandable.
[0][STA.NAME] [YEAR] [SUM.CDD]
1 NAME1 1967 760
2 NAME1 1985 800
3 NAME1 1996 740
4 NAME1 2003 810
5 NAME1 2011 790
6 NAME2 1967 700
7 NAME2 1985 690
8 NAME2 1996 850
9 NAME2 2003 790
10 NAME3 1967 760
11 NAME3 1985 800
12 NAME3 1990 740
13 NAME3 1996 810
14 NAME3 2003 790
15 NAME3 2011 800
我正在尝试与此一起返回新的DF
I am trying to return a new DF with this
[STA.NAME] [Eq'n of trend]
NAME1 (y = mx + b)
NAME2 (y = mx + b)
等...
最终,我将需要计算趋势的方差以及数据的总方差,并希望最终将其附加到此结果数据集中,以实现类似...
Eventually I will need to calculate variance of the trends, as well as total variance of data and would like to eventually append those to this resulting data set for something like...
[STA.NAME] [TREND] [VAR.TREND] [VAR.DATA]
with values in rows, 1 for each STA.NAME...
任何帮助都将不胜感激,如果有比lm()更好的方法,而我目前对此感到困惑,那么我也会对此感兴趣.
Any help is greatly appreciated, If there is a better way than lm(), with which I am currently stumped, I would be interested as well.
非常感谢您,
杰西
推荐答案
以下是使用plyr
中的ddply()
返回每个组的系数的简单解决方案:
Here is a simple solution using ddply()
from plyr
to return the coefficients for each group:
首先复制数据:
x <- read.table(text="
STA.NAME YEAR SUM.CDD
1 NAME1 1967 760
2 NAME1 1985 800
3 NAME1 1996 740
4 NAME1 2003 810
5 NAME1 2011 790
6 NAME2 1967 700
7 NAME2 1985 690
8 NAME2 1996 850
9 NAME2 2003 790
10 NAME3 1967 760
11 NAME3 1985 800
12 NAME3 1990 740
13 NAME3 1996 810
14 NAME3 2003 790
15 NAME3 2011 800 ", header=TRUE)
现在进行建模:
library(plyr)
ddply(x, .(STA.NAME), function(z)coef(lm(SUM.CDD ~ YEAR, data=z)))
STA.NAME (Intercept) YEAR
1 NAME1 -444.8361 0.6147541
2 NAME2 -6339.2047 3.5702200
3 NAME3 -995.2381 0.8928571
现在,根据您要执行的操作,创建数据的单个模型可能更简单(也许更有意义):
Now, depending on what you want to do, it may be simpler (and perhaps more meaningful) to create a single model of your data:
fit <- lm(SUM.CDD ~ YEAR + STA.NAME, data=x)
获取摘要:
summary(fit)
Call:
lm(formula = SUM.CDD ~ YEAR + STA.NAME, data = x)
Residuals:
Min 1Q Median 3Q Max
-63.57 -22.21 10.72 18.62 80.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2065.6401 1463.5353 -1.411 0.1858
YEAR 1.4282 0.7345 1.945 0.0778 .
STA.NAMENAME2 -15.8586 27.5835 -0.575 0.5769
STA.NAMENAME3 3.9046 24.7089 0.158 0.8773
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 40.8 on 11 degrees of freedom
Multiple R-squared: 0.3056, Adjusted R-squared: 0.1162
F-statistic: 1.614 on 3 and 11 DF, p-value: 0.2424
仅提取系数:
coef(fit)
(Intercept) YEAR STA.NAMENAME2 STA.NAMENAME3
-2065.640078 1.428247 -15.858650 3.904632
最后,您可能希望将模型与交互条件配合起来.该模型可为您有效地提供与原始plyr
解决方案相同的结果.根据您的数据和目标,这可能是实现此目的的方法:
Finally, you perhaps wanted to fit a model with interaction terms. This model gives you effectively the same results as the original plyr
solution. Depending on your data and your objectives, this might be the way to do it:
fit <- lm(SUM.CDD ~ YEAR * STA.NAME, data=x)
summary(fit)
Call:
lm(formula = SUM.CDD ~ YEAR * STA.NAME, data = x)
Residuals:
Min 1Q Median 3Q Max
-57.682 -13.166 -1.012 23.006 63.046
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -444.8361 2280.7464 -0.195 0.850
YEAR 0.6148 1.1447 0.537 0.604
STA.NAMENAME2 -5894.3687 3661.9795 -1.610 0.142
STA.NAMENAME3 -550.4020 3221.8390 -0.171 0.868
YEAR:STA.NAMENAME2 2.9555 1.8406 1.606 0.143
YEAR:STA.NAMENAME3 0.2781 1.6172 0.172 0.867
Residual standard error: 39.17 on 9 degrees of freedom
Multiple R-squared: 0.4763, Adjusted R-squared: 0.1854
F-statistic: 1.637 on 5 and 9 DF, p-value: 0.2451
这篇关于在一个数据帧中多次进行线性回归计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!