R中子集的回归 [英] Regression on a subset in R
问题描述
我想针对不同的国家(即数据的子集)进行相同的回归分析.我确实想出了在R中的操作方法,但是在Stata中轻松完成相同的操作之后,我想知道R中是否有更好的方法.
I want to run the same regression for different countries (i.e. subsets of my data). I did figure out how to do in R, but after doing the same thing with much more ease in Stata, I wonder if there's a better way in R.
在Stata中,您将执行以下操作:
In Stata you would do something like this:
foreach country in USA UK France {
reg y x1 x2 if country == "`country'"
}
简单易读,对吧?在R中,我想出了split和ddply方法,两者都比较复杂.要使用split
Simple and human-readable, right? In R, I come up with split and ddply methods, both are more complicated. To use split
data.subset <- split(data, data$country)[c("USA", "UK", "France")]
res <- lapply(data.subset, function(subset) lm(y ~ x1 + x2, data=subset))
更紧凑的代码将使用ddply
.但是在这种情况下,该模型将在所有国家/地区运行.我可以选择几个吗?
A more compact code would use ddply
. But in this case, the model will be run for all countries. Can I choose just a few?
ddply(data, "country", function(df) coefficients(lm(Y~X1+X2, data=df)))
但是,再次,我有兴趣知道是否在Stata中有一个直观,可读的for循环?
But again, I'm interested in knowing whether there is an intuitive, readable for-loop like in Stata?
推荐答案
有几种选择:
使用ddply
的一种方法:
ddply( data[ data$country %in% c('USA','UK','France'), ], "country", function(df) coefficients(lm(Y~X1+X2, data=df)))
以不同的方式使用lapply
(或sapply
):
Using lapply
(or sapply
) a different way:
lapply( c("USA","UK","France"), function(curcont) lm(y ~ x1+x2, data=data, subset= country==curcont))
您可以使用nlme软件包中的lmList
函数.
You could use the lmList
function from the nlme package.
您可以直接使用lm(尽管这将使用方差的合并估计而不是单独的估计):
You could use lm directly (though this will use a pooled estimate of the variance instead of separate ones):
lm( y ~ 0 + factor(country) * (x1 + x2), data=data, subset= country %in% c('USA','UK','France') )
还有by
函数和for
循环以及其他可能的选项.
There is also the by
function and for
loops and probably other options as well.
这篇关于R中子集的回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!