新手需要在R中循环lm [英] Novice needs to loop lm in R

查看:359
本文介绍了新手需要在R中循环lm的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是遗传学的博士研究生,我正在尝试使用线性回归对一些遗传数据进行关联分析.在下表中,我将每个特征"与每个"SNP"进行回归分析,还有一个交互术语包括"var"

I'm a PhD student of genetics and I am trying do association analysis of some genetic data using linear regression. In the table below I'm regressing each 'trait' against each 'SNP' There is also a interaction term include as 'var'

我仅使用R已有2周的时间,并且没有任何编程背景,因此请解释一下我想了解的任何帮助.

I've only used R for 2 weeks and I don't have any programming background so please explain any help provided as I want to understand.

这是我的数据示例:

Sample ID   var trait 1 trait 2 trait 3 SNP1    SNP2    SNP3
77856517    2   188      3       2        1      0       0
375689755   8   17      -1      -1        1     -1      -1
392513415   8   28       14      4        1      1       1
393612038   8   85       14      6        1      1       0
401623551   8   152      11     -1        1      0       0
348466144   7   -74      11      6        1      0       0
77852806    4   81       16      6        1      1       0
440614343   8   -93      8       0        0      1       0
77853193    5   3        6       5        1      1       1

这是我一直用于单次回归的代码:

and this is the code I've been using for a single regression:

result1 <-lm(trait1~SNP1+var+SNP1*var, na.action=na.exclude)

我想运行一个循环,针对每个SNP测试每个特征.

I want to run a loop where every trait is tested against each SNP.

我一直在尝试修改在线找到的代码,但是我总是遇到一些我不知道如何解决的错误.

I've been trying to modify codes I've found online but I always run into some error that I don't understand how to solve.

感谢您提供任何帮助.

推荐答案

我个人认为问题不那么容易.特别适合R新手.

Personally I don't find the problem so easy. Specially for an R novice.

这里有一个基于动态创建回归公式的解决方案. 想法是使用paste函数创建不同的公式项,然后使用as.formula强制使用y~ x + var + x * var强制结果字符串tp一个公式.此处yx是公式动态项:c(trait1,trai2,..)中的y和c(SNP1,SNP2,...)中的x.当然,这里我使用lapply进行循环.

Here a solution based on creating dynamically the regression formula. The idea is to use paste function to create different formula terms, y~ x + var + x * var then coercing the result string tp a formula using as.formula. Here y and x are the formula dynamic terms: y in c(trait1,trai2,..) and x in c(SNP1,SNP2,...). Of course here I use lapply to loop.

lapply(1:3,function(i){
 y <- paste0('trait',i)
 x <- paste0('SNP',i)
 factor1 <- x
 factor2 <- 'var'
 factor3 <- paste(x,'var',sep='*')
 listfactor <- c(factor1,factor2,factor3)
 form <- as.formula(paste(y, "~",paste(listfactor,collapse="+")))
 lm(formula = form, data = dat)
})

我希望有人能提供更简单的解决方案,或者更多的R-ish解决方案:)

I hope someone come with easier solution, ore more R-ish one:)

编辑

由于@DWin注释,我们可以将公式简化为y~x*var,因为这意味着yxvarx*var

Thanks to @DWin comment , we can simplify the formula to just y~x*var since it means y is modeled by x,var and x*var

因此,上面的代码将简化为:

So the code above will be simplified to :

 lapply(1:3,function(i){
     y <- paste0('trait',i)
     x <- paste0('SNP',i)
     LHS <- paste(x,'var',sep='*')
     form <- as.formula(paste(y, "~",LHS)
     lm(formula = form, data = dat)
    })

这篇关于新手需要在R中循环lm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆