r for回归lm(y〜x)的循环 [英] r for loop for regression lm(y~x)
问题描述
示例:
df <- data.frame(A=1:5, B=2:6, C=3:7,D=4:8,E=5:9,F=6:10)
我想使用像y的col 1和2以及像x的其余cols来建立回归循环lm(y,x).
I want make a regression loop lm(y,x) using like y the col 1 and 2 and like x the rest of the cols.
我的想法:
lmf <- function (y,x) {
f <- lm(y ~ x, data=df)
cbind(summary(f)$r.squared,summary(f)$coefficients)
}
for(y in 1:3)
{
R<- apply(df[,3:6], 2, lmf(y,x)); R
}
错误:model.frame.default(formula = y〜x,data = df,drop.unused.levels = TRUE)中的错误:可变长度不同(为"x"找到)
error: Error in model.frame.default(formula = y ~ x, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'x')
我给出的例子很小,但是我的数据是y为50 cols,x为300 cols.
I give this example very small but my data are 50 cols for the y and 300 cols for the x.
我要执行的操作相同:lm(df $ 1〜df $ 3,data = df);lm(df $ 1〜df $ 4,data = df),[...] lm(df $ 2〜df $ 3,data = df)...但以自动方式.此外,我想提取结果$ coefficients和$ r.squared.
What I want is the same to do: lm(df$1~df$3, data=df); lm(df$1~df$4, data=df),[...], lm(df$2~df$3, data=df)... but in automatic way. Moreover I want to extract the results $coefficients and $r.squared.
推荐答案
我有一个使用dplyr,tidyr和broom软件包的替代版本.这个想法是指定要作为Y和X对待的变量.基于这些Y和X集创建2个不同的数据集.然后重塑数据集,以便能够将每个Y与一个X组合.最后,对于每个组合,运行线性回归并将模型输出保存为数据集.
I have an alternative version using dplyr, tidyr and broom packages. The idea is to specify the variables you want to treat as Y and X. Create 2 different datasets based on those Y and X sets. Then reshape datasets in order to be able to combine each Y with one X. Finally, for each combination run a linear regression and save the model output as a dataset.
# Check whether package name is installed...
check_package <- function(package_name) {
if (!(package_name %in% rownames(installed.packages()))) {
install.packages(package_name, dependencies = TRUE)
}
}
check_package("broom")
check_package("dplyr")
check_package("tidyr")
library(dplyr)
library(broom)
library(tidyr)
# example dataset (picking 4 columns)
dt <- data.frame(mtcars) %>% select(mpg, disp, cyl, wt)
# specify which columns we want as y (dependent) and x (independent)
ynames <- c("disp","mpg")
xnames <- c("cyl","wt")
# create and reshape datasets
dt1 <- dt[,ynames]
dt1 <- gather(dt1,y,yvalue)
dt2 <- dt[,xnames]
dt2 <- gather(dt2, x, xvalue)
dt1 %>%
group_by(y) %>% # group by dependent variable
do(data.frame(.,dt2)) %>% # combine each y with all x
group_by(y,x)%>% # get combinations of y and x to regress
do(tidy(lm(yvalue~xvalue, data=.))) # return lm output as dataframe
# y x term estimate std.error statistic p.value
# 1 disp cyl (Intercept) -156.608976 35.1805064 -4.451584 1.090157e-04
# 2 disp cyl xvalue 62.598925 5.4693168 11.445474 1.802838e-12
# 3 disp wt (Intercept) -131.148416 35.7165961 -3.671918 9.325668e-04
# 4 disp wt xvalue 112.478138 10.6353299 10.575896 1.222320e-11
# 5 mpg cyl (Intercept) 37.884576 2.0738436 18.267808 8.369155e-18
# 6 mpg cyl xvalue -2.875790 0.3224089 -8.919699 6.112687e-10
# 7 mpg wt (Intercept) 37.285126 1.8776273 19.857575 8.241799e-19
# 8 mpg wt xvalue -5.344472 0.5591010 -9.559044 1.293959e-10
这篇关于r for回归lm(y〜x)的循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!