如何强制scikit学习DictVectorizer不放弃功能？ [英] how to force scikit-learn DictVectorizer not to discard features?

查看：115 发布时间：2020/10/2 3:28:00 python classification scikit-learn

本文介绍了如何强制scikit学习DictVectorizer不放弃功能？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用scikit-learn进行分类任务。
我的代码从数据中提取特征，并将其存储在字典中，如下所示：

Im trying to use scikit-learn for a classification task. My code extracts features from the data, and stores them in a dictionary like so:

feature_dict['feature_name_1'] = feature_1
feature_dict['feature_name_2'] = feature_2

当我在其中拆分数据时为了使用 sklearn.cross_validation 进行测试，一切都会正常进行。 Im遇到的问题是，当测试数据是新集而不是学习集的一部分时（尽管每个样本具有相同的确切特征）。在将分类器适合学习集之后，当我尝试调用 clf.predict 时，出现以下错误：

when I split the data in order to test it using sklearn.cross_validation everything works as it should. The problem Im having is when the test data is a new set, not part of the learning set (although it has the same exact features for each sample). after I fit the classifier on the learning set, when I try to call clf.predict I get this error:

ValueError: X has different number of features than during model fitting.

我假设这与此有关（在DictVectorizer文档中）：

I am assuming this has to do with this (out of the DictVectorizer docs):

在fit或fit_transform期间未遇到的命名功能将被
忽略。

Named features not encountered during fit or fit_transform will be silently ignored.

DictVectorizer 删除了一些我猜想的功能...如何禁用/解决该功能？

DictVectorizer has removed some of the features I guess... How do I disable/work around this feature?

谢谢

===编辑===

问题出在larsMans建议我两次安装DictVectorizer。

The problem was as larsMans suggested that I was fitting the DictVectorizer twice.

如何强制scikit学习DictVectorizer不放弃功能？ [英] how to force scikit-learn DictVectorizer not to discard features?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何强制scikit学习DictVectorizer不放弃功能？ [英] how to force scikit-learn DictVectorizer not to discard features?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭