R中的可变长度不同(使用lme4进行线性建模) [英] Variable lengths differ in R (linear modelling with lme4)

查看:211
本文介绍了R中的可变长度不同(使用lme4进行线性建模)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的输入文件:


Treat1  Treat2  Batch   gene1    gene2
High    Low     1       92.73    4.00
Low     Low     1       101.85   6.00
High    High    1       136.00   4.00
Low     High    1       104.00   3.00
High    Low     2       308.32   10.00
Low     Low     2       118.93   3.00
High    High    2       144.47   3.00
Low     High    2       189.66   4.00
High    Low     3       95.12    2.00
Low     Low     3       72.08    6.00
High    High    3       108.65   2.00
Low     High    3       75.00    3.00
High    Low     4       111.39   5.00
Low     Low     4       119.80   4.00
High    High    4       466.55   11.00
Low     High    4       125.00   3.00

还有成千上万的附加列,每个列都有标题和数字列表,长度与"gene1"列相同.

There are tens of thousands of additional columns, each with a header and a list of numbers, same length as "gene1" column.

我的代码:

library(lme4)
library(lmerTest)

# Import the data.
mydata <- read.table("input_file", header=TRUE, sep="\t")

# Make batch into a factor
mydata$Batch <- as.factor(mydata$Batch)

# Check structure
str(mydata)

# Get file without the factors, so that names(df) gives gene names.
genefile <- mydata[c(4:2524)]

# Loop through all gene names and run the model once per gene and print to file.
for (i in names(genefile)){
    lmer_results <- lmer(i ~ Treat1*Treat2 + (1|Batch), data=mydata)
    lmer_summary <- summary(lmer_results)
    write(lmer_summary,file="results_file",append=TRUE, sep="\t", quote=FALSE)
}

结构:

'data.frame':     16 obs. of  2524 variables:
$ Treat1          : Factor w/ 2 levels "High","Low": 1 2 1 2 1 2 1 2 1 2 ...
$ Treat2          : Factor w/ 2 levels "High","Low": 2 2 1 1 2 2 1 1 2 2 ...
$ Batch           : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
$ gene1           : num  92.7 101.8 136 104 308.3 ...
$ gene2           : num  4 6 4 3 10 3 3 4 2 6 ...

我的错误消息:

model.frame.default中的错误(data = mydata,drop.unused.levels = TRUE,公式= i〜: 可变长度有所不同(适用于"Treat1") 调用:lmer ...-> eval-> eval->-> model.frame.default 执行停止

Error in model.frame.default(data = mydata, drop.unused.levels = TRUE, formula = i ~ : variable lengths differ (found for 'Treat1') Calls: lmer ... -> eval -> eval -> -> model.frame.default Execution halted

我尝试检查所有涉及的对象,并且看不到变量长度的任何差异,并且我还确保没有丢失数据.使用na.exclude运行它不会更改任何内容.

I have tried to examine all objects involved and cannot see any differences in variable lengths and I have also made sure there are no missing data. Running it with na.exclude doesn't change anything.

有什么想法吗?

推荐答案

@Roland的诊断(lmer正在寻找名为 i 的变量,而不是名称为 的变量>是i:强制性的刘易斯·卡洛尔参考书)是正确的.处理此问题的最直接方法是使用reformulate(),例如:

@Roland's diagnosis (lmer is looking for a variable called i, not a variable whose name is i: obligatory Lewis Carroll reference) is correct, I think. The most immediate way to handle this would be with reformulate(), something like:

for (i in names(genefile)){
    form <- reformulate(c("Treat1*Treat2","(1|Batch)"),response=i)
    lmer_results <- lmer(form, data=mydata)
    lmer_summary <- summary(lmer_results)
    write(lmer_summary,file="results_file",
           append=TRUE, sep="\t", quote=FALSE)
}

再三考虑,您应该能够使用内置的refit()方法显着地加快计算速度 ,该方法为新的响应变量改写了模型:为简单起见,假设第一个该基因称为geneAAA:

On second thought, you should be able to speed up your computations significantly using the built-in refit() method, which refits a model for a new response variable: suppose for simplicity that the first gene is called geneAAA:

wfun <- function(x)  write(summary(x), 
       file="results_file", append=TRUE, sep="\t",quote=FALSE)
mod0 <- lmer(geneAAA ~ Treat1*Treat2 + (1|Batch), data=mydata)
wfun(mod0)
for (i in names(genefile)[-1]) {
    mod1 <- refit(mod0,mydata[[i]])
    wfun(mod1)
}

(顺便说一句,我不确定您的write()命令是否有任何明智的操作...)

(By the way, I'm not sure your write() command does anything sensible ...)

这篇关于R中的可变长度不同(使用lme4进行线性建模)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆