scikit将输出metrics.classification_report学习为CSV/制表符分隔格式 [英] scikit learn output metrics.classification_report into CSV/tab-delimited format

查看:139
本文介绍了scikit将输出metrics.classification_report学习为CSV/制表符分隔格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Scikit-Learn中进行多类文本分类.使用具有数百个标签的多项朴素贝叶斯分类器对数据集进行训练.这是Scikit Learn脚本的摘录,用于拟合MNB模型

I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model

from __future__ import print_function

# Read **`file.csv`** into a pandas DataFrame

import pandas as pd
path = 'data/file.csv'
merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)

# define X and y using the original DataFrame
X = merged.text
y = merged.grid

# split X and y into training and testing sets;
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# import and instantiate CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()

# create document-term matrices using CountVectorizer
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)

# import and instantiate MultinomialNB
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

# fit a Multinomial Naive Bayes model
nb.fit(X_train_dtm, y_train)

# make class predictions
y_pred_class = nb.predict(X_test_dtm)

# generate classification report
from sklearn import metrics
print(metrics.classification_report(y_test, y_pred_class))

在命令行屏幕上,metrics.classification_report的简化输出如下所示:

And a simplified output of the metrics.classification_report on command line screen looks like this:

             precision  recall   f1-score   support
     12       0.84      0.48      0.61      2843
     13       0.00      0.00      0.00        69
     15       1.00      0.19      0.32       232
     16       0.75      0.02      0.05       965
     33       1.00      0.04      0.07       155
      4       0.59      0.34      0.43      5600
     41       0.63      0.49      0.55      6218
     42       0.00      0.00      0.00       102
     49       0.00      0.00      0.00        11
      5       0.90      0.06      0.12      2010
     50       0.00      0.00      0.00         5
     51       0.96      0.07      0.13      1267
     58       1.00      0.01      0.02       180
     59       0.37      0.80      0.51      8127
      7       0.91      0.05      0.10       579
      8       0.50      0.56      0.53      7555      
    avg/total 0.59      0.48      0.45     35919

我想知道是否有任何方法可以将报告输出输出到带有常规列标题的标准csv文件中

I was wondering if there was any way to get the report output into a standard csv file with regular column headers

当我将命令行输出发送到csv文件或尝试将屏幕输出复制/粘贴到电子表格-Openoffice Calc或Excel时,它将结果汇总到一栏中.看起来像这样:

When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:

帮助表示赞赏.谢谢!

推荐答案

scikit-learn v0.20开始,将分类报告转换为pandas数据框的最简单方法是简单地将报告返回为dict:

As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:

report = classification_report(y_test, y_pred, output_dict=True)

然后构造一个数据框并转置它:

and then construct a Dataframe and transpose it:

df = pandas.DataFrame(report).transpose()

从这里开始,您可以随意使用标准的pandas方法来生成所需的输出格式(CSV,HTML,LaTeX等).

From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

另请参见 https://scikit-learn上的文档.org/0.20/modules/generated/sklearn.metrics.classification_report.html

这篇关于scikit将输出metrics.classification_report学习为CSV/制表符分隔格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆