SciPy和scikit-学习-ValueError:尺寸不匹配 [英] SciPy and scikit-learn - ValueError: Dimension mismatch

查看：312 发布时间：2020/5/18 18:55:38 python numpy scipy scikit-learn

本文介绍了SciPy和scikit-学习-ValueError:尺寸不匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 SciPy 和 scikit-learn 训练和应用多项朴素贝叶斯分类器进行二进制文本分类.精确地，我使用模块 sklearn.feature_extraction.text.CountVectorizer 用于创建稀疏矩阵，该稀疏矩阵包含来自文本和模块

I use SciPy and scikit-learn to train and apply a Multinomial Naive Bayes Classifier for binary text classification. Precisely, I use the module sklearn.feature_extraction.text.CountVectorizer for creating sparse matrices that hold word feature counts from text and the module sklearn.naive_bayes.MultinomialNB as the classifier implementation for training the classifier on training data and applying it on test data.

CountVectorizer的输入是表示为unicode字符串的文本文档列表.训练数据比测试数据大得多.我的代码如下所示(简化):

The input to the CountVectorizer is a list of text documents represented as unicode strings. The training data is much larger than the test data. My code looks like this (simplified):

vectorizer = CountVectorizer(**kwargs)

# sparse matrix with training data
X_train = vectorizer.fit_transform(list_of_documents_for_training)

# vector holding target values (=classes, either -1 or 1) for training documents
# this vector has the same number of elements as the list of documents
y_train = numpy.array([1, 1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1, -1, ...])

# sparse matrix with test data
X_test = vectorizer.fit_transform(list_of_documents_for_testing)

# Training stage of NB classifier
classifier = MultinomialNB()
classifier.fit(X=X_train, y=y_train)

# Prediction of log probabilities on test data
X_log_proba = classifier.predict_log_proba(X_test)

问题::只要

Problem: As soon as MultinomialNB.predict_log_proba() is called, I get ValueError: dimension mismatch. According to the IPython stacktrace below, the error occurs in SciPy:

/path/to/my/code.pyc
--> 177         X_log_proba = classifier.predict_log_proba(X_test)

/.../sklearn/naive_bayes.pyc in predict_log_proba(self, X)
    76             in the model, where classes are ordered arithmetically.
    77         """
--> 78         jll = self._joint_log_likelihood(X)
    79         # normalize by P(x) = P(f_1, ..., f_n)
    80         log_prob_x = logsumexp(jll, axis=1)

/.../sklearn/naive_bayes.pyc in _joint_log_likelihood(self, X)
    345         """Calculate the posterior log probability of the samples X"""
    346         X = atleast2d_or_csr(X)
--> 347         return (safe_sparse_dot(X, self.feature_log_prob_.T)
    348                + self.class_log_prior_)
    349 

/.../sklearn/utils/extmath.pyc in safe_sparse_dot(a, b, dense_output)
    71     from scipy import sparse
    72     if sparse.issparse(a) or sparse.issparse(b):
--> 73         ret = a * b
    74         if dense_output and hasattr(ret, "toarray"):
    75             ret = ret.toarray()

/.../scipy/sparse/base.pyc in __mul__(self, other)
    276 
    277             if other.shape[0] != self.shape[1]:
--> 278                 raise ValueError('dimension mismatch')
    279 
    280             result = self._mul_multivector(np.asarray(other))

我不知道为什么会发生此错误.有人可以向我解释一下，并提供解决此问题的方法吗?提前非常感谢！

I have no idea why this error occurs. Can anybody please explain it to me and provide a solution for this problem? Thanks a lot in advance!

SciPy和scikit-学习-ValueError:尺寸不匹配 [英] SciPy and scikit-learn - ValueError: Dimension mismatch

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

SciPy和scikit-学习-ValueError:尺寸不匹配 [英] SciPy and scikit-learn - ValueError: Dimension mismatch

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭