Python:如何在用于多标签类的SVM文本分类器算法中查找准确性结果 [英] Python : How to find Accuracy Result in SVM Text Classifier Algorithm for Multilabel Class

查看:321
本文介绍了Python:如何在用于多标签类的SVM文本分类器算法中查找准确性结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了以下代码集: 我需要检查X_train和X_test的准确性

I have used following set of code: And I need to check accuracy of X_train and X_test

以下代码对我在多标签类上的分类问题很有帮助

The following code works for me in my classification problem over multi-labeled class

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "the big apple is great",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "people abbreviate new york city as nyc",
                    "the capital of great britain is london",
                    "london is in the uk",
                    "london is in england",
                    "london is in great britain",
                    "it rains a lot in london",
                    "london hosts the british museum",
                    "new york is great and so is london",
                    "i like london better than new york"])
y_train = [[0],[0],[0],[0]
            ,[0],[0],[1],[1]
            ,[1],[1],[1],[1]
            ,[2],[2]]
X_test = np.array(['nice day in nyc',
                   'the capital of great britain is london',
                   'i like london better than new york',
                   ])   
target_names = ['Class 1', 'Class 2','Class 3']

classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_df=1,max_df=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

输出

nice day in nyc => Class 1
the capital of great britain is london => Class 2
i like london better than new york => Class 3

我想检查训练"和测试数据集"之间的准确性. 评分功能不适用于我,它显示了一个错误,指出无法接受多标签值

I would like to check the accuracy between Training and Test Dataset. Score Function doesn't work for me, it shows an error stating that multilabel value can't accepted

>>> classifier.score(X_train, X_test)

NotImplementedError:分数不支持多标签分类器

NotImplementedError: score is not supported for multilabel classifiers

请帮助我获得训练和测试数据的准确性结果,并为我们的分类案例选择一种算法.

Kindly help me get accuracy results for training and test data and choose an algorithm for our classification case.

推荐答案

如果要获得测试集的准确性得分,则需要创建一个答案键,可以将其称为y_test.除非您知道正确的答案,否则您不会知道自己的预测是否正确.

If you want to get an accuracy score for your test set, you'll need to create an answer key, which you can call y_test. You can't know if your predictions are correct unless you know the correct answers.

一旦您有了答案键,您就可以获取准确性.所需的方法是 sklearn.metrics.accuracy_score .

Once you have an answer key, you can get the accuracy. The method you want is sklearn.metrics.accuracy_score.

我将其写在下面:

from sklearn.metrics import accuracy_score

# ... everything else the same ...

# create an answer key
# I hope this is correct!
y_test = [[1], [2], [3]]

# same as yours...
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)

# get the accuracy
print accuracy_score(y_test, predicted)

此外,sklearn除了准确性外还有其他几个指标.在此处查看它们: sklearn.metrics

Also, sklearn has several other metrics besides accuracy. See them here: sklearn.metrics

这篇关于Python:如何在用于多标签类的SVM文本分类器算法中查找准确性结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆