caret/rfe-error:“x 和 y 中的样本数应该相同"; [英] caret/rfe-error: "there should be the same number of samples in x and y"

查看:44
本文介绍了caret/rfe-error:“x 和 y 中的样本数应该相同";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是使用 R 进行交叉验证.第 1-31 列是特征,第 32 列是输出类.
我从 .xls 文件加载数据.但是我对 rfeControl 功能有严重的问题.请看我的代码:

My aim is to perform cross validation with R. Columns 1-31 are Features and column 32 is the output class.
I load data from a .xls file. But I have severe issues with the rfeControl-function. Please see my code:

install.packages('e1071')
library(e1071)
install.packages('readxl')
library(readxl)
library(rpart)
install.packages('randomForest')
library(randomForest)
install.packages('party')
library(party)
install.packages('mlbench')
library(mlbench)
install.packages('caret')
library(caret)
#----------------------------------------------------------
# Import Data
getwd()
setwd("working_directory_name")
df <- read_excel('test_data.xls')
#----------------------------------------------------------
# Get Information on your data (optional)
str(df)
table(df$F32)
#----------------------------------------------------------
install.packages('XLConnect')
library(XLConnect)
# Recursive Feature Selection Approach
control <- rfeControl(functions=rfFuncs, method="cv", number=5)
#x = as.vector(unlist(df[, 2:29]))
#y = as.vector(unlist(df[, 32])) 
# Run the algorithm (Features, Ground Truth, Testes SetSizes)
#results <- rfe(x, y, sizes=c(1:28), rfeControl=control)
results <- rfe(df[, 2:29], df[, 32], sizes=c(1:28), rfeControl=control)
# Visualize results for set sizes
print(results)
# List chosen features
predictors(results)
# plot the results
plot(results, type=c("g", "o"))

运行代码后的结果是:

Fehler in rfe.default(df[, 2:29], df[, 32], sizes = c(1:28),rfeControl = control) : 应该有相同数量的样本在 x 和 y

Fehler in rfe.default(df[, 2:29], df[, 32], sizes = c(1:28), rfeControl = control) : there should be the same number of samples in x and y

我已经看过这些网站:
1. http://braziebrazie.blogspot.de/2015/08/caret-r-error-in-rfedefau-should-be.html
2. R rfe 函数插入符号"包错误:x 和 y 中的样本数应该相同
3. R 试图让 caret/rfe 工作

1. 中取消列出向量的建议对我不起作用.新的错误是:

The suggestion from 1. to unlist the vector doesn't work for me. The new error is:

Fehler in if (nrow(x) != length(y)) stop("应该是一样的x 和 y 中的样本数") : Argument hat Länge 0

Fehler in if (nrow(x) != length(y)) stop("there should be the same number of samples in x and y") : Argument hat Länge 0

2. 中的例子没有任何问题:

The example in 2. works without any problems:

set.seed(7)
d=data.frame(matrix(rnorm(2901*15,1,.5),ncol=15))
#something like dependent variable
dp=factor(sample(c(1,1,1,1, 1, 1,2,2,2, 3 ,3,3,4, 4, 4),2901,replace = TRUE))
# define the control using a random forest selection function
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
# run the RFE algorithm
sz=50 # Change sz to 2901 for full sample
results <- rfe(d[1:sz, ],   dp[1:sz],   sizes=c(1:15), rfeControl=control)
# summarize the results
print(results)
plot(results, type=c("g", "o"))

在 3. 它说

y 应该是数字或因子向量

y should be a numeric or factor vector

但我如何将其定义为数值或因子向量?

But how do I define this as numeric or factor vector?

这是 xls 文件格式:xls 文件格式
也许问题出在我加载 xls 文件的方式上.

This is the xls file format: xls file format
Maybe the problem is there because of the way I load the xls-file.

非常感谢您的建议和推荐!

Thanks a lot for your suggestions and recommendations!

推荐答案

遇到了同样的问题.将 y 转换为矩阵并且有效.

Had the same issue. Converted y to matrix and it worked.

results <- rfe(df[, 2:29], as.matrix(df[, 32]), sizes=c(1:28), rfeControl=control)

这篇关于caret/rfe-error:“x 和 y 中的样本数应该相同";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆