用于多个变量回归的回路&输出一个子集 [英] for loops for regression over multiple variables & outputting a subset

查看：138 发布时间：2018/1/28 13:12:48 r for-loop matrix regression output

本文介绍了用于多个变量回归的回路&输出一个子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图应用这个QA：高效的循环逻辑回归在R / a>对我自己的问题，但我不能使它的工作。我还没有尝试使用apply，但是有人告诉我一个for循环在这里是最好的（如果有人相信，请随意解释！）我认为这个问题是相当普遍的，而不是太深奥的这是我想要实现的：我有一个数据集，包含3个预测变量（性别，年龄，种族）和一个因变量（一个比例）为86几个人的基因位置。我想要为每个位置运行双变量线性回归（因此，对于3个预测变量，所以需要86个线性回归）。然后，我想以一些易读的格式输出结果;我的想法是矩阵行=性别，年龄和种族，列= 86的职位。每行*列组合将会有一个p值。然后，我可以将p值<0.1（或任何我想要的阈值）称为哪个预测指标与每个位置的比例显着相关。

这是我到目前为止的代码。

  BB< ;  -  seq.csv [，6：91]＃包含86个位置的数据帧
 AA < -  seq.csv [，2：4]＃包含3个预测变量
 $的数据帧b $ b linreg<  -  matrix（NA，3,86）＃制作一个结果向量并用NA 
填充（我在1:86）#loop在每个位置变量
 {
 for（j in 1：3）＃for each position variable，loop over each predictor 
 {
 linreg [i，j] < -  lm（BB [，i]〜AA [，j ]）＃双变量线性回归
}}

无论如何改变，简化它循环的位置只有一个预测变量），我仍然得到一个错误，我的矩阵是不一样的长度（要替换的项目数不是替换长度的倍数）。实际上，长度（linreg）= 286（3 * 86），长度（BB）= 86，长度（AA）= 3。我知道后两个是数据框，而不是矩阵...但是，如果我将它们转换为矩阵，我会得到一个无效的类型错误（无效的类型（列表）为变量'BB [，我]'）。我不知道如何解决这个错误，因为我只是不太明白R ...我已经咨询了应用统计遗传学与R编程和R编程艺术，没有用，我一直在谷歌搜索整天。我甚至没有得到输出结果的编码...

我很感激任何调试技巧或一些更好的方法来编写代码的建议！感谢大家提前。

解决方案

真的很难给出一个明确的答案，您的数据事先，但这个可能工作。我假设你的两个数据框有相同的行数（观测值）：

pre $ df < - cbind（AA [，2：4]，BB [，6:91]）$ b $ b mods < - apply（as.data.frame（df [，4:89]），2，FUN = function（x）{lm （x〜df [，1] + df [，2] + df [，3]}）＃这个矩阵的行将对应于拦截，性别，年龄，种族和列是每个遗传位置的结果 pvals < - sapply（mods，function（x）{summary（x）$ coefficients [，4]）

至于这是否是正确的做法，我相信您作为遗传流行病学家的判断力！ p>

I have tried to apply this QA: "efficient looping logistic regression in R" to my own problem but I cannot quite make it work. I haven't tried to use apply, but I was told by a few people that a for loop is the best here (if someone believes otherwise please feel free to explain!) I think this problem is pretty generalizeable and not too esoteric for the forum.

This is what I want to achieve: I have a dataset with 3 predictor variables (gender, age, race) and a dependent variable (a proportion) for 86 genetic positions for several people. I want to run bivariate linear regressions for each position (so 86 linear regressions for 3 predictor variables). Then I want to output the results in some easily legible format; my idea is a matrix with rows=gender, age, and race, and columns=the 86 positions. There would be a p value for each row*column combination. Then I could call the p values<0.1 (or whatever threshold I want) to easily see which predictors are significantly associated with proportion at each position.

This is the code I have so far.
BB <- seq.csv[,6:91] #the data frame containing the 86 positions AA <- seq.csv[,2:4] #the data frame containing the 3 predictor variables linreg <- matrix(NA,3,86) #make a results vector and fill it with NA for (i in 1:86) #loop over each position variable { for (j in 1:3) #for each position variable, loop over each predictor { linreg[i,j] <- lm(BB[,i]~AA[,j]) #bivariate linear regression }}
No matter how I change this (for example, simplifying it to loop over the positions for only one predictor), I still get an error that my matrices are not the same length (number of items to replace is not a multiple of replacement length). In fact, length(linreg)=286 (3*86) and length(BB)=86 and length(AA)=3. I know the latter two are dataframes, not matrices...but if I convert them to matrices I get an invalid type error (invalid type (list) for variable 'BB[, i]'). I do not know how to resolve this error because I just don't understand R well enough...I've consulted the books Applied Statistical Genetics with R and Art of R Programming to no avail, and I'm been Google searching all day. And I haven't even gotten to the coding for outputting the results...

I'd appreciate any debugging tips or some suggestions on a better way to code this! Thank you all in advance.
解决方案
Really hard to give a definitive answer without knowing the structure of your data beforehand, but this might work. I'm assuming that your two data frames have the same number of rows (observations):
df <- cbind( AA[ , 2:4 ] , BB[ , 6:91 ] ) mods <- apply( as.data.frame( df[ , 4:89 ] ) , 2 , FUN = function(x){ lm( x ~ df[,1] + df[,2] + df[,3] } ) # The rows of this matrix will correspond to the intercept, gender, age, race, and the columns are the results for each of your 86 genetic postions pvals <- sapply( mods , function(x){ summary(x)$coefficients[,4] )
As to whether or not that is the right thing to do I will trust to your judgement as a genetic epidemiologist!

这篇关于用于多个变量回归的回路&输出一个子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于多个变量回归的回路&输出一个子集 [英] for loops for regression over multiple variables & outputting a subset

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于多个变量回归的回路&amp;输出一个子集 [英] for loops for regression over multiple variables &amp; outputting a subset

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

用于多个变量回归的回路&输出一个子集 [英] for loops for regression over multiple variables & outputting a subset

登录关闭