scikit多标签分类:ValueError:错误的输入形状 [英] scikit multilabel classification: ValueError: bad input shape
问题描述
我相信带有loss='log'
的SGDClassifier()
支持多标签分类,并且我不必使用OneVsRestClassifier. 选中
I beieve SGDClassifier()
with loss='log'
supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this
现在,我的数据集很大,我正在使用HashingVectorizer
并将结果作为输入传递给SGDClassifier
.我的目标具有42048个功能.
Now, my dataset is quite big and I am using HashingVectorizer
and passing result as input to SGDClassifier
. My target has 42048 features.
运行此命令时,如下所示:
When I run this, as follows:
clf.partial_fit(X_train_batch, y)
我得到:ValueError: bad input shape (300000, 42048)
.
我还如下使用类作为参数,但是仍然存在相同的问题.
I have also used classes as the parameter as follows, but still same problem.
clf.partial_fit(X_train_batch, y, classes=np.arange(42048))
在SGDClassifier的文档中,显示为y : numpy array of shape [n_samples]
In the documentation of SGDClassifier, it says y : numpy array of shape [n_samples]
推荐答案
否,SGDClassifier
不会 进行多标签分类-它会进行 multiclass 分类,即一个不同的问题,尽管两个问题都可以通过一对多"的问题简化来解决.
No, SGDClassifier
does not do multilabel classification -- it does multiclass classification, which is a different problem, although both are solved using a one-vs-all problem reduction.
然后,无论是SGD还是 OneVsRestClassifier.fit
将接受y
的稀疏矩阵.正如您已经发现的,前者想要一个标签数组.出于多标签的目的,后者需要标签列表的列表,例如
Then, neither SGD nor OneVsRestClassifier.fit
will accept a sparse matrix for y
. The former wants an array of labels, as you've already found out. The latter wants, for multilabel purposes, a list of lists of labels, e.g.
y = [[1], [2, 3], [1, 3]]
表示X[0]
具有标签1,X[1]
具有标签{2,3}
,而X[2]
具有标签{1,3}
.
to denote that X[0]
has label 1, X[1]
has labels {2,3}
and X[2]
has labels {1,3}
.
这篇关于scikit多标签分类:ValueError:错误的输入形状的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!