randomForest错误:预测变量中不允许使用NA(但数据中不能使用NA) [英] randomForest Error: NA not permitted in predictors (but no NAs in data)

查看:699
本文介绍了randomForest错误:预测变量中不允许使用NA(但数据中不能使用NA)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我尝试运行"genie3"算法(参考: http: //在R中使用"randomForest"方法的//homepages.inf.ed.ac.uk/vhuynht/software.html ).

So I am attempting to run the 'genie3' algorithm (ref: http://homepages.inf.ed.ac.uk/vhuynht/software.html) in R which uses the 'randomForest' method.

我遇到以下错误:

> weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551)
Starting RF computations with 1000 trees/target gene,
and 67 candidate input genes/tree node
Computing gene 1/11805
Show Traceback

Rerun with Debug
Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors 

所以我检查了我的数据中是否存在NA,并且没有:

So I checked if NAs are present in my data, and there are none:

> NAs<-sapply(tmpLog2FC, function(x) sum(is.na(x)))
> length(which(NAs!=0))
[1] 0

然后我尝试编辑特定的'get.weight.matrix()'函数,以通过更改以下行来省略NA(以防万一):

I then tried editing the specific 'get.weight.matrix()' function to omit NAs (just in case) by changing this line:

rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, ...)

收件人:

rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, na.action=na.omit)

然后我获取了源代码,并通过单独调用它(并显示实际脚本)来仔细检查它是否包含更改:

I then sourced the code, and double checked that it incorporated the changes by calling it on its own (and displaying the actual script):

    }
    target.gene.name <- gene.names[target.gene.idx]
    # remove target gene from input genes
    these.input.gene.names <- setdiff(input.gene.names, target.gene.name)
    x <- expr.matrix[,these.input.gene.names]
    y <- expr.matrix[,target.gene.name]
    rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, na.action=na.omit)

但是,当尝试重新运行时,出现相同的错误:

However when attempting to re-run, I get the same error:

Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors 

有人遇到过类似的事情吗?关于我能做什么的任何想法?

Has anyone encountered anything similar to this? Any ideas on what I can do?

先谢谢了.

*按照建议,我重新运行了调试程序:

* As suggested, I re-ran with debug:

> weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551)
Starting RF computations with 1000 trees/target gene,
and 67 candidate input genes/tree node
Computing gene 1/11805
Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors
Called from: randomForest(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE, 
na.action = na.omit)
Browse[1]> 
> 

调试显示我怀疑的那一行引发了错误,但是它以"na.action = na.omit"的已编辑形式显示了该错误.我更加困惑.

The debug shows that the line that I suspected is throwing the error, but it displays it in the edited form with 'na.action=na.omit'. I am even more confused. How can a dataset that has no NAs, run with a code that allows for NAs to be omitted, display this error?

推荐答案

您可以使用以下命令查找行列表,如果有任何预测变量将没有值,则将在其中显示行.

You can use the following command to find out the list of rows in which if any predictor will have no value it will be displayed.

data [!complete.cases(data),]

data[!complete.cases(data),]

仔细检查行,例如在我的情况下,没有值的行","、、、、、、" (在我的文件列中,预测变量以逗号分隔)显示为NA在RF运行时.

Check that rows carefully, like in my case the rows having no value ",,,,,,,,," (in my file columns predictor variables were comma separated) were showed as NA at the time of RF run.

您可以删除该行.

谢谢

这篇关于randomForest错误:预测变量中不允许使用NA(但数据中不能使用NA)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆