错误预测:X 每个样本有 n 个特征，期望 m [英] Error predicting: X has n features per sample, expecting m

查看：63 发布时间：2021/7/16 20:06:14 python python-3.x scikit-learn tf-idf

本文介绍了错误预测:X 每个样本有 n 个特征，期望 m的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到了以下代码，我将文本转换为 tf:

I got the following code, where I transform a text to tf:

...
x_train, x_test, y_train, y_test = model_selection.train_test_split(dataset['documents'],dataset['classes'],test_size=test_percentil)
#Term document matrix
count_vect = CountVectorizer(ngram_range=(1, Ngram), min_df=1, max_features=MaxVocabulary)
x_train_counts = count_vect.fit_transform(x_train)
x_test_counts=count_vect.transform(x_test)
#Term Inverse-Frequency
tf_transformer = TfidfTransformer(use_idf=True).fit(x_train_counts)
lista=tf_transformer.get_params()
x_train_tf = tf_transformer.transform(x_train_counts)
x_test_tf=tf_transformer.transform(x_test_counts)
...

然后，我训练一个模型并使用 pickle 保存它.当我在另一个程序中尝试预测新数据时，问题就出现了.基本上，我得到了:

Then, I train a model and save it using pickle. The problem comes when, in another program, I try to predict new data. Basically, I got:

count_vect = CountVectorizer(ngram_range=(1, 1), min_df=1, max_features=None)
x_counts = count_vect.fit_transform(dataset['documents'])

#Term Inverse-Frequency
tf_transformer = TfidfTransformer(use_idf=True).fit(x_counts)
x_tf = tf_transformer.transform(x_train_counts)

model.predict(x_tf)

当我执行这段代码时，输出是

When I execute this code, the output is

ValueError:X 每个样本有 8933 个特征；期待 7488

ValueError: X has 8933 features per sample; expecting 7488

我知道这是 TfIdf 表示的问题，我听说我需要使用相同的 tf_transformer 和矢量化器来获得预期的输入形状，但我不知道如何实现这一点.我可以存储其他转换器和矢量化器，但我尝试使用不同的组合，但一无所获.

I know this is a problem with the TfIdf representation, and I hear that I need to use the same tf_transformer and vectorizer to get the expected input shape, but I don't know how to achieve this. I can store the others transformers and vectorizers, but I have tried using different combinations and I got nothing.

错误预测:X 每个样本有 n 个特征，期望 m [英] Error predicting: X has n features per sample, expecting m

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

错误预测:X 每个样本有 n 个特征，期望 m [英] Error predicting: X has n features per sample, expecting m

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭