模型的特征数量必须与输入匹配.模型n_features是40,输入n_features是38 [英] Number of features of the model must match the input. Model n_features is 40 and input n_features is 38
问题描述
我遇到此错误.请给我任何建议以解决它.这是我的代码.我正在从train.csv中获取训练数据并从另一个文件test.csv中测试数据.我是机器学习的新手,所以我无法理解是什么问题.请给我任何建议.
i am getting this error.please give me any suggestion to resolve it.here is my code.i am taking traing data from train.csv and testing data from another file test.csv.i am new to machine learning so i could not understand what is the problem.give me any suggestion.
import quandl,math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
train = pd.read_csv("train.csv", index_col=None)
test = pd.read_csv("test.csv", index_col=None)
vectorizer = CountVectorizer(min_df=1)
X1 = vectorizer.fit_transform(train['question'])
Y1 = vectorizer.fit_transform(test['testing'])
X=X1.toarray()
Y=Y1.toarray()
#print(Y.shape)
number=LabelEncoder()
train['answer']=number.fit_transform(train['answer'].astype('str'))
features = ['question','answer']
y = train['answer']
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X[:25],y)
predicted_result=clf.predict(Y[17])
p_result=number.inverse_transform(predicted_result)
f = open('output.txt', 'w')
t=str(p_result)
f.write(t)
print(p_result)
推荐答案
您的代码有多个问题.但是与此问题相关的是,您要在训练数据和测试数据上同时安装CountVectorizer(
There are multiple problems with your code.
But the thing related to this question is that you are fitting the CountVectorizer (vectorizer
) on both train and test data, which is why you are getting different features.
您应该做的是:
X1 = vectorizer.fit_transform(train['question'])
# The following line is changed
Y1 = vectorizer.transform(test['testing'])
这篇关于模型的特征数量必须与输入匹配.模型n_features是40,输入n_features是38的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!