如何在R中使用glm循环多次曝光和结果以及不同模型? [英] How to loop multiple exposures and outcomes as well as different models with glm in R?

查看:73
本文介绍了如何在R中使用glm循环多次曝光和结果以及不同模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码当前针对每个结果的每次曝光运行未调整的glm(每个结果3次曝光),并将结果导出到列表中.对于每次曝光,我需要3个模型:模型1 :未经调整(我们目前拥有),模型2 :针对cov1进行了调整,模型3 :针对cov1,cov2和cov3进行了调整

The code below currently runs unadjusted glm for each exposure on each outcome (3 exposures per outcome) and exports the results into a list. For each exposure, I need 3 models: model 1: unadjusted (which we currently have), model 2: adjusted for cov1, model 3: adjusted for cov1, cov2 and cov3

如何在代码中实现不同的模型?

How would I implement the different models into this code?

amino_df <- data.frame(y = rbinom(100, 1, 0.5), y2 = rbinom(100, 1, 0.3), y3 = rbinom(100, 1, 0.2), y4 = rbinom(100, 1, 0.22),
                       exp1 = rnorm(100), exp2 = rnorm(100), exp3 = rnorm(100),
                       cov1 = rnorm(100), cov2 = rnorm(100), cov3 = rnorm(100))

exp <- c("exp1", "exp2", "exp3")
y <- c("y", "y2","y3","y4")
cov <- c("cov1", "cov2", "cov3")

obs_results <- replicate(length(y), data.frame())  

for(j in seq_along(y)){
  for (i in seq_along(exp)){
    mod <- as.formula(paste(y[j], "~", exp[i]))
    glmmodel <- glm(formula = mod, family = binomial, data = amino_df)
    
    obs_results[[j]][i,1] <- names(coef(glmmodel))[2]
    obs_results[[j]][i,2] <- exp(glmmodel$coefficients[2])
    obs_results[[j]][i,3] <- summary(glmmodel)$coefficients[2,2]
    obs_results[[j]][i,4] <- summary(glmmodel)$coefficients[2,4]
    obs_results[[j]][i,5] <- exp(confint.default(glmmodel)[2,1])
    obs_results[[j]][i,6] <- exp(confint.default(glmmodel)[2,2])
  }
  colnames(obs_results[[j]]) <- c("exposure","OR", "SE", "P_value", "95_CI_LOW","95_CI_HIGH")
}
names(obs_results) <- y

obs_df <- do.call("rbind", lapply(obs_results, as.data.frame)) 

编辑-我现在有一个解决方案:

EDIT - I now have a solution:

进一步的问题,下面的代码是否可以修改为包括针对不同风险的不同模型?因此,对于exp1,请针对所有3个缺点进行调整:cov1,cov2,cov3,但对于exp2,请仅针对cov1,cov2进行调整吗?和exp3仅限于cov2和cov1?

Further question, could this code below be adapted to include different models for the different exposures? So for exp1, adjust for all 3 cons: cov1, cov2, cov3, but for exp2 adjust for cov1, cov2 only? and exp3 cov2 and cov1 only?

amino_df <- data.frame(y = rbinom(100, 1, 0.5), y2 = rbinom(100, 1, 0.3), 
                       y3 = rbinom(100, 1, 0.2), y4 = rbinom(100, 1, 0.22),
                       exp1 = rnorm(100), exp2 = rnorm(100), exp3 = rnorm(100),
                       cov1 = rnorm(100), cov2 = rnorm(100), cov3 = rnorm(100))

exp <- c("exp1", "exp2", "exp3")
y <- c("y", "y2","y3","y4")
model <- c("", "+ cov1", "+ cov1 + cov2 + cov3")

obs_df <- lapply(y, function(j){
    lapply(exp, function(i){
        lapply(model, function(h){
            mod = as.formula(paste(j, "~", i, h))
            glmmodel = glm(formula = mod, family = binomial, data = amino_df)
            
            obs_results = data.frame(
                outcome = j,
                exposure = names(coef(glmmodel))[2], 
                covariate = h,
                OR = exp(glmmodel$coefficients[2]), 
                SE = summary(glmmodel)$coefficients[2,2], 
                P_value = summary(glmmodel)$coefficients[2,4], 
                `95_CI_LOW` = exp(confint.default(glmmodel)[2,1]), 
                `95_CI_HIGH` = exp(confint.default(glmmodel)[2,2])
            )
            return(obs_results)
        }) %>% bind_rows
    }) %>% bind_rows
}) %>% bind_rows %>% `colnames<-`(gsub("X95","95",colnames(.))) %>% `rownames<-`(NULL)

head(obs_df)

推荐答案

就像在开始时指定 exp y 一样,您可以指定不同的模型类型

Just like how you specified exp and y at the beginning, you can specify the different model types.

这是一种使用lapply()而不是for循环的方法:

Here is an approach using lapply() instead of for-loops:

amino_df <- data.frame(y = rbinom(100, 1, 0.5), y2 = rbinom(100, 1, 0.3), 
                       y3 = rbinom(100, 1, 0.2), y4 = rbinom(100, 1, 0.22),
                       exp1 = rnorm(100), exp2 = rnorm(100), exp3 = rnorm(100),
                       cov1 = rnorm(100), cov2 = rnorm(100), cov3 = rnorm(100))

exp <- c("exp1", "exp2", "exp3")
y <- c("y", "y2","y3","y4")
model <- c("", "+ cov1", "+ cov1 + cov2 + cov3")

obs_df <- lapply(y, function(j){
    lapply(exp, function(i){
        lapply(model, function(h){
            mod = as.formula(paste(j, "~", i, h))
            glmmodel = glm(formula = mod, family = binomial, data = amino_df)
            
            obs_results = data.frame(
                outcome = j,
                exposure = names(coef(glmmodel))[2], 
                covariate = h,
                OR = exp(glmmodel$coefficients[2]), 
                SE = summary(glmmodel)$coefficients[2,2], 
                P_value = summary(glmmodel)$coefficients[2,4], 
                `95_CI_LOW` = exp(confint.default(glmmodel)[2,1]), 
                `95_CI_HIGH` = exp(confint.default(glmmodel)[2,2])
            )
            return(obs_results)
        }) %>% bind_rows
    }) %>% bind_rows
}) %>% bind_rows %>% `colnames<-`(gsub("X95","95",colnames(.))) %>% `rownames<-`(NULL)

head(obs_df)
#  outcome exposure            covariate        OR        SE   P_value 95_CI_LOW 95_CI_HIGH
#1       y     exp1                      0.9425290 0.2125285 0.7806305 0.6214270   1.429550
#2       y     exp1               + cov1 0.9356460 0.2138513 0.7557639 0.6152917   1.422794
#3       y     exp1 + cov1 + cov2 + cov3 0.9638427 0.2174432 0.8655098 0.6293876   1.476027
#4       y     exp2                      1.3297429 0.1865916 0.1266809 0.9224452   1.916879
#5       y     exp2               + cov1 1.3300740 0.1866225 0.1264124 0.9226190   1.917473
#6       y     exp2 + cov1 + cov2 + cov3 1.3558196 0.1903111 0.1097054 0.9337031   1.968770

最后我包括了 gsub("X95","95",colnames(.)),因为在创建新数据帧时,列名以数字开头(即,"; 95_CI_LOW","95_CI_HIGH")获得"X"默认情况下插入在开头;此代码将其删除.

I included gsub("X95","95",colnames(.)) at the end because when creating new data frames, column names that begin with a number (i.e., "95_CI_LOW", "95_CI_HIGH") get an "X" inserted at the beginning by default; this code removes it.

补充

如果在模型中使用不同的协变量对不同的曝光进行了唯一调整,则可以执行以下操作.最简单的解决方案是通过上面的代码运行所有可能的暴露度+协变量组合,然后过滤 obs_df (使用 filter())以仅选择所需的分析.但是,这意味着如果要处理大型数据集,将不必要地花费更长的时间.

If different exposures are uniquely adjusted with different covariates in your models, the following can be done instead. The easiest solution is to run all possible exposure+covariate combinations through the code above, and then filter obs_df (with filter()) to select only the analyses that you want. However, it means it will take unnecessarily longer to run if you are working with large datasets.

一种更直接的方法是具体输入要包含在 model 中的哪些曝光率和协变量组合,然后删除 lapply(exp)函数(并相应地编辑核心函数):

A more direct approach is to enter specifically which exposure+covariate combinations to include in the model and remove the lapply(exp) function (and edit the core function accordingly):

model <- c("exp1 + cov1 + cov2 + cov3", "exp2 + cov1 + cov2", "exp3 + cov1")

obs_df <- lapply(y, function(j){
    lapply(model, function(h){
        mod = as.formula(paste(j, "~", h))
        glmmodel = glm(formula = mod, family = binomial, data = amino_df)
            
        obs_results = data.frame(
            outcome = j,
            exposure = names(coef(glmmodel))[2], 
            covariate = gsub(names(coef(glmmodel))[2],"",h), # gsub to remove exposure from covariate(s)
            OR = exp(glmmodel$coefficients[2]), 
            SE = summary(glmmodel)$coefficients[2,2], 
            P_value = summary(glmmodel)$coefficients[2,4], 
            `95_CI_LOW` = exp(confint.default(glmmodel)[2,1]), 
            `95_CI_HIGH` = exp(confint.default(glmmodel)[2,2])
        )
        return(obs_results)
    }) %>% bind_rows
}) %>% bind_rows %>% `colnames<-`(gsub("X95","95",colnames(.))) %>% `rownames<-`(NULL)

这篇关于如何在R中使用glm循环多次曝光和结果以及不同模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆