通过对R中的变量列表进行分组来运行线性模型 [英] Run linear models by group over list of variables in R

查看:72
本文介绍了通过对R中的变量列表进行分组来运行线性模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我需要为每个组站点"运行6个2变量线性模型.然后,我需要将结果转换为数据框.线性模型中的第二个变量发生变化.我使用 lapply()来完成这部分工作,但是我不知道如何按组运行.我在SO上找到了可以回答部分问题的答案,但是我不知道如何将它们放在一起.

I have a data frame and I need to run 6 2-variable linear models for each group 'site'. Then, I need to convert the results to a data frame. The second variable in the linear model changes. I have that part down using lapply(), but I can't figure out how to run by groups. I've found answers on SO that answer parts of my question, but I can't figure out how to put it all together.

以下是一些数据:

structure(list(SiteName = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("bp10", "bp12"), class = "factor"), 
    DMWT = c(13.9697916666667, 13.9125, 14.2152083333333, 14.7810416666667, 
    15.1541666666667, 15.7535416666667, 17.3254166666667, 18.4872916666667, 
    20.0564583333333, 21.0595833333333, 21.3925), DMAT = c(16.6714631359947, 
    18.474493439025, 20.9517661662977, 23.7017661662978, 25.5957055602372, 
    20.9688840743375, 23.7188840743375, 25.6128234682769, 27.5143386197921, 
    27.6279749834285, 26.1355507410042), ADD = c(0, 0, 0, 1.90367965367967, 
    5.70129870129876, 0, 1.90367965367967, 5.70129870129876, 
    11.400432900433, 17.2132034632037, 21.53354978355), Air200 = c(7.3229782875097, 
    7.40616010569152, 7.50025101478243, 7.63384949963092, 7.78642525720668, 
    7.51736892282216, 7.65096740767065, 7.80354316524641, 7.97854316524641, 
    8.14729316524641, 8.29592952888278), Air100 = c(15.2711601056916, 
    15.362599499631, 15.512902529934, 15.727296469328, 15.9717661662977, 
    15.5300204379738, 15.7444143773677, 15.9888840743374, 16.2306264985798, 
    16.4472174076707, 16.6433537713071), Air75 = c(16.8986348531664, 
    17.0426752572068, 17.1927762673078, 17.3687358632674, 17.5567156612472, 
    17.2098941753475, 17.3858537713071, 17.5738335692869, 17.7820153874687, 
    18.0100961955496, 18.2532275086809), Air50 = c(19.5072207117523, 
    19.6340388935705, 19.7382813178129, 19.8887358632674, 20.1060085905402, 
    19.7553992258526, 19.9058537713072, 20.1231264985799, 20.4400961955496, 
    20.7669143773678, 20.9841871046405), Air10 = c(21.9214631359947, 
    21.5850994996311, 21.2563116208432, 21.1714631359947, 21.4502510147826, 
    21.2734295288829, 21.1885810440344, 21.4673689228223, 21.9696416500951, 
    22.3779749834284, 22.5476719531254)), .Names = c("SiteName", 
"DMWT", "DMAT", "ADD", "Air200", "Air100", "Air75", "Air50", 
"Air10"), row.names = c(547L, 548L, 549L, 550L, 551L, 1593L, 
1594L, 1595L, 1596L, 1597L, 1598L), class = "data.frame")

这是在模型中使用每个变量的代码.如何使用这些网站?:

Here's the code for using each variable in a model. How do I use the sites?:

siteslist <- unique(d$SiteName) 
varlist <- names(d)[4:9]
models <- lapply(varlist, function(x) {  # apply the modeling function to our list of air variables
  lm(substitute(DMWT ~ DMAT + i, list(i = as.name(x))), data = d)  # linear model with air variable substituted
})

然后获取模型结果&转换为数据框:

Then get model results & convert to a data frame:

library(relaimpo)
sumfun <- function(x) c(coef(x),
                        summary(x)$adj.r.squared,
                        sqrt(mean(resid(x)^2,na.rm=TRUE)),
                        calc.relimp(x,type="betasq")$betasq[1],
                        calc.relimp(x,type="betasq")$betasq[2],
                        calc.relimp(x,type="pratt")$pratt[1],
                        calc.relimp(x,type="pratt")$pratt[2])
mod.df <- as.data.frame(t(sapply(models,sumfun)))

还尝试将变量和站点组合在一起以进行以下操作:

Also tried combining variables and sites to do something like this:

siteslist <- unique(d$SiteName)                              
varlist <- names(d)[4:9]
sets <- expand.grid(SiteName = siteslist, Var = varlist)
models <- lapply(1:nrow(sets), function(x) {  # apply the modeling function to our list of air variables
  lm(substitute(DMWT ~ DMAT + i, list(i = as.name(sets$Var[x]))), data = d[d$SiteName ==  sets$SiteName[x],])  # linear model with air variable substituted
})

...但是我得到一个错误"eval(expr,envir,enclos)中的错误:找不到对象'1'"

...but I get an error "Error in eval(expr, envir, enclos) : object '1' not found"

推荐答案

这就是我的方法.请注意,这是未经测试的,因为我尚未安装 relaimpo .我真的只是在重新打包您的代码.

This is how I would do it. Note this is untested as I haven't installed relaimpo. I'm really just re-packaging your code.

一般方法是1.开发一项适用于一组人的功能2.使用 split 将数据分为几组3.使用 lapply 将功能应用于每个组4.(如果需要)将结果合并在一起

The general method is 1. develop a function that works on one group 2. use split to divide your data into groups 3. use lapply to apply the function to each group 4. (if needed) combine the results together

我所做的唯一更改是(a)为一个站点提取一部分数据并将其命名为 one_site .(b)在您的建模代码中使用 one_site .(c)与使用 substitute 相比,我更喜欢将公式作为字符串粘贴在一起,因此我进行了更改.(d)留白和格式化以提高可读性(主要使用RStudio的重新格式化代码").

The only changes I made are (a) to pull out a subset of data for one site and name it one_site. (b) to use one_site in your modeling code. (c) I prefer pasting a formula together as a string to using substitute, so I made that change. (d) White space and formatting for readability (mostly using RStudio's "reformat code").

## set up
varlist <- names(d)[4:9]
library(relaimpo)
sumfun <- function(x) {
    c(
        coef(x),
        summary(x)$adj.r.squared,
        sqrt(mean(resid(x) ^ 2, na.rm = TRUE)),
        calc.relimp(x, type = "betasq")$betasq[1],
        calc.relimp(x, type = "betasq")$betasq[2],
        calc.relimp(x, type = "pratt")$pratt[1],
        calc.relimp(x, type = "pratt")$pratt[2]
    )
}

## Testing: this works for one_site
one_site <- subset(d, SiteName == "bp10")

models <- lapply(varlist, function(x) {  # apply the modeling function to our list of air variables
    form <- as.formula(sprintf("DMWT ~ DMAT + %s", x))
    lm(form, data = one_site)  # linear model with air variable substituted
})

## desired result
mod.df <- as.data.frame(t(sapply(models, sumfun)))

将其转换为功能

一旦您拥有适用于单个站点的代码,我们就会将其变成一个函数.唯一的输入似乎是一个站点的数据以及 varlist 中的变量.而不是在底部分配结果,我们返回它:

Turn it into a function

Once you have code that works for a single site, we turn it into a function. The only inputs seem to be the data for one site and the variables in varlist. Instead of assigning the result at the bottom, we return it:

fit_one_site = function(one_site, varlist) {
    models <- lapply(varlist, function(x) {
            # apply the modeling function to our list of air variables
            form = as.formula(sprintf("DMWT ~ DMAT + %s", x))
            lm(form, data = one_site)  # linear model with air variable substituted
    })
    return(as.data.frame(t(sapply(models, sumfun))))
}

现在,我们可以使用 split 通过 SiteName 拆分数据,并使用 lapply 应用 fit_one_site 发挥作用.

Now we can use split to split your data up by SiteName, and lapply to apply the fit_one_site function to each piece.

results = lapply(split(d, d$SiteName), FUN = fit_one_site, varlist = names(d)[4:9])

结果应为数据帧列表,每个站点一个.如果要将它们组合到一个数据框中,请在数据框R-FAQ列表中查看我的回答的相关部分.

The results should be list of data frames, one for each site. If you want to combine them into one data frame, see the relevant part of my answer at the list of data frames R-FAQ.

这篇关于通过对R中的变量列表进行分组来运行线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆