按国家划分数据,并在每个子集上创建线性回归模型 [英] Split data.frame by country, and create linear regression model on each subset
问题描述
我有一个来自世界银行的数据框,看起来像这样;
I have a data.frame of data from the World Bank which looks something like this;
country date BirthRate US.
4 Aruba 2011 10.584 25354.8
5 Aruba 2010 10.804 24289.1
6 Aruba 2009 11.060 24639.9
7 Aruba 2008 11.346 27549.3
8 Aruba 2007 11.653 25921.3
9 Aruba 2006 11.977 24015.4
总共有70个这个数据框中的一些国家/地区的子集喜欢运行线性回归。
All in all there 70 something sub sets of countries in this data frame that I would like to run a linear regression on.
如果我使用以下内容,我会为一个国家/地区获得一个不错的lm;
If I use the following I get a nice lm for a single country;
andora = subset(high.sub, country == "Andorra")
andora.lm = lm(BirthRate~US., data = andora)
anova(andora.lm)
summary(andora.lm)
但是当我尝试在for循环中使用相同类型的代码我将在代码下方打印一个错误;
But when I try to use the same type of code in a for loop I an error which I'll print below the code;
high.sub = subset(highInc, date > 1999 & date < 2012)
high.sub <- na.omit(high.sub)
highnames <- unique(high.sub$country)
for (i in highnames) {
linmod <- lm(BirthRate~US., data = high.sub, subset = (country == "[i]"))
}
错误消息:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
如果我可以得到这个循环来运行,我最好还要附加系数,甚至是将每个模型的r平方值转换为空的数据框架。任何帮助将不胜感激。
If I can get this loop to run I would ideally like to append the coefficients and even better the r-squared values for each model to an empty data.frame. Any help would be greatly appreciated.
感谢
Josh
推荐答案
这是对@ BondedDust的评论的一些修改。
This is a slight modification of @BondedDust's comment.
models <- sapply(unique(as.character(df$country)),
function(cntry)lm(BirthRate~US.,df,subset=(country==cntry)),
simplify=FALSE,USE.NAMES=TRUE)
# to summarize all the models
lapply(models,summary)
# to run anova on all the models
lapply(models,anova)
这将生成一个名为列表的模型,因此您可以将Aruba的模型提取为:
This produces a named list of models, so you could extract the model for Aruba as:
models[["Aruba"]]
这篇关于按国家划分数据,并在每个子集上创建线性回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!