scikit多标签分类:ValueError:错误的输入形状 [英] scikit multilabel classification: ValueError: bad input shape

查看:221
本文介绍了scikit多标签分类:ValueError:错误的输入形状的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我相信带有loss='log'SGDClassifier()支持多标签分类,并且我不必使用OneVsRestClassifier. 选中

I beieve SGDClassifier() with loss='log' supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this

现在,我的数据集很大,我正在使用HashingVectorizer并将结果作为输入传递给SGDClassifier.我的目标具有42048个功能.

Now, my dataset is quite big and I am using HashingVectorizer and passing result as input to SGDClassifier. My target has 42048 features.

运行此命令时,如下所示:

When I run this, as follows:

clf.partial_fit(X_train_batch, y)

我得到:ValueError: bad input shape (300000, 42048).

我还如下使用类作为参数,但是仍然存在相同的问题.

I have also used classes as the parameter as follows, but still same problem.

clf.partial_fit(X_train_batch, y, classes=np.arange(42048))

在SGDClassifier的文档中,显示为y : numpy array of shape [n_samples]

In the documentation of SGDClassifier, it says y : numpy array of shape [n_samples]

推荐答案

否,SGDClassifier不会 进行多标签分类-它会进行 multiclass 分类,即一个不同的问题,尽管两个问题都可以通过一对多"的问题简化来解决.

No, SGDClassifier does not do multilabel classification -- it does multiclass classification, which is a different problem, although both are solved using a one-vs-all problem reduction.

然后,无论是SGD还是 OneVsRestClassifier.fit 将接受y的稀疏矩阵.正如您已经发现的,前者想要一个标签数组.出于多标签的目的,后者需要标签列表的列表,例如

Then, neither SGD nor OneVsRestClassifier.fit will accept a sparse matrix for y. The former wants an array of labels, as you've already found out. The latter wants, for multilabel purposes, a list of lists of labels, e.g.

y = [[1], [2, 3], [1, 3]]

表示X[0]具有标签1,X[1]具有标签{2,3},而X[2]具有标签{1,3}.

to denote that X[0] has label 1, X[1] has labels {2,3} and X[2] has labels {1,3}.

这篇关于scikit多标签分类:ValueError:错误的输入形状的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆