模型的特征数量必须与输入匹配吗? [英] Number of features of the model must match the input?

查看:64
本文介绍了模型的特征数量必须与输入匹配吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对我拥有的一些数据使用 RandomForestClassifier.代码如下:

I'm trying to use a RandomForestClassifier on some data I have. The code is below:

print train_data[0,0:20]
print train_data[0,21::]
print test_data[0]

print 'Training...'
forest = RandomForestClassifier(n_estimators=100)
forest = forest.fit( train_data[0::,0::20], train_data[0::,21::] )

print 'Predicting...'
output = forest.predict(test_data)

但这会产生以下错误:

ValueError:模型的特征数必须与输入匹配.模型 n_features 为 3,输入 n_features 为 21

ValueError: Number of features of the model must match the input. Model n_features is 3 and input n_features is 21

前三个打印语句的输出是:

The output from the first three print statements is:

[   0.            0.            0.            0.            1.            0.
    0.            0.            0.            0.            1.            0.
    0.            0.            0.           37.7745986  -122.42589168
    0.            0.            0.        ]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  1.  0.]
[   0.            0.            0.            0.            0.            0.
    0.            1.            0.            0.            1.            0.
    0.            0.            0.            0.           37.73505101
 -122.3995877     0.            0.            0.        ]

我假设数据对于我的 fit/predict 调用是正确的格式,但它在 predict 上出错.谁能看到我在这里做错了什么?

I had assumed that the data was in the correct format for my fit/predict calls, but it is erroring out on the predict. Can anyone see what I am doing wrong here?

推荐答案

用于训练模型的输入数据是train_data[0::,0::20],我认为这是一个错误(为什么跳过中间的功能?)——它应该是 train_data[0::,0:20] 而不是基于你在开始时所做的调试打印.

The input data used to train the model is train_data[0::,0::20], which I think is a mistake (why skip features in between?) -- it should be train_data[0::,0:20] instead based on the debug prints you did in the beginning.

此外,似乎最后一列代表了 train_datatest_data 中的标签.在预测时,您可能希望在调用 predict 函数时传递 test_data[:, :20] 而不是 test_data.

Also, it seems that the last column represents the labels in both train_data and test_data. When predicting, you might want to pass test_data[:, :20] instead of test_data when calling thepredict function.

这篇关于模型的特征数量必须与输入匹配吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆