从多个.csv文件创建混淆矩阵 [英] Creating confusion matrix from multiple .csv files

查看:172
本文介绍了从多个.csv文件创建混淆矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多.csv文件,它们的格式如下.

I have a lot of .csv files with the following format.

338,800
338,550
339,670
340,600 
327,500
301,430
299,350
284,339
284,338
283,335
283,330
283,310
282,310
282,300
282,300
283,290

从第1列开始,我想读取当前行并将其与上一行的值进行比较.如果大于或等于,则继续比较,如果当前单元格的值小于上一行-那么我将当前值除以前一值,然后继续.例如,在上表中:根据我在第1列中的要求,我们将获得的较小值为327(因为327小于先前的值340)-然后我们将327除以340,得出的值为0.96.在我们按如下所示打印条件( A )后,我的python脚本应立即退出.

From column 1, I wanted to read current row and compare it with the value of the previous row. If it is greater OR equal, continue comparing and if the value of the current cell is smaller than the previous row - then i divide the current value with the previous value and proceed. For example in the table given above: the smaller value we will get depending on my requirement from Column 1 is 327 (because 327 is smaller than the previous value 340) - and then we divide 327 by 340 and we get the value 0.96. My python script should exit right after we print the criteria (A) as given below.

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

    with open("test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    results = []
    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result =  int(col1)/ int(previous_value)
            results.append(result)
            print category(result)
            previous_value = value
    print (results)
    print (sum(results))
    print (category(sum(results) / len(results)))

最后,我要为当前目录中的所有.csv文件运行脚本,并构建如下的混淆矩阵.假设A1.csvA2.csvA3.csv应该(或预测)打印A,B1.csvB2.csvB3.csv应该(或预测)打印BC1.csvC2.csvC3.csv应该(或预测为)打印C,...等.我们如何通过多个.csv文件自动创建混淆矩阵,例如使用Python的以下代码?

Finally, i want to run my scrip for all the .csv files i have in the current directory and build a confusion matrix like the following. Let's say A1.csv, A2.csv, A3.csv are supposed (or predicted) to print A, B1.csv, B2.csv, B3.csv are supposed (or predicted) to print B and C1.csv, C2.csv and C3.csv are supposed (or predicted) to print C, ... etc. How can we automatically create a confusion matrix from multiple .csv files for example like the following using Python?

如下所示,矩阵的彩色块(行标签)将向我们显示A(A的真实值计数),B(b的真实值计数)和C(从上面给出的函数category()的控制逻辑中计算C)等的真实值.来自控制逻辑的列标签位于if-else语句(A,B,C,D和E)中.

As it is shown below, the colored blocks of the matrix (row-labels) will show us the number of counts of A (count of true values for A), B (count of true values for b) and C (count of true values for C), ..etc from the control logic of our function category()- given above. The column labels from the control logic we have inside the if-else statement (A, B, C, D and E).

推荐答案

添加def get_predict(filename)

def get_predict(filename):
    if 'Alex' in filename:
        return 'Alexander'
    else:
        return filename [0]


读取n个文件,使用pandas crosstab计算混淆矩阵:


Reading n files, compute confusion matrix using pandas crosstab:

import os
import pandas as pd

def get_category(filepath):
    def category(val):
        print('predict({}; abs({})'.format(val, abs(val)))
        if 0.8 < val <= 0.9:
            return "A"
        if abs(val - 0.7) < 1e-10:
            return "B"
        if 0.5 < val < 0.7:
            return "C"
        if abs(val - 0.5) < 1e-10:
            return "E"
        return "D"

    with open(filepath, "r") as csvfile:
        ff = csv.reader(csvfile)

        results = []
        previous_value = 0
        for col1, col2 in ff:
            value = int(col1)
            if value >= previous_value:
                previous_value = value
            else:
                results.append(value / previous_value)
                previous_value = value

    return category(sum(results) / len(results))

matrix = {'actual':[], 'predict':[]}
path = 'test/confusion'
for filename in os.listdir( path ):
    # The first Char in filename is Predict Key
    matrix['predict'].append(filename[0])
    matrix['actual'].append(get_category(os.path.join(path, filename)))

df = pd.crosstab(pd.Series(matrix['actual'], name='Actual'),
                 pd.Series(matrix['predict'], name='Predicted')
                 )
print(df)

输出 :(用给定的示例数据读取"A.csv,B.csv,C.csv"三遍)

Output: (Reading "A.csv, B.csv, C.csv" with the given example Data three times)

Predicted  A  B  C
Actual            
A          3  0  0
B          0  3  0
C          0  0  3

使用Python:3.4.2-pandas:0.19.2

Tested with Python:3.4.2 - pandas:0.19.2

这篇关于从多个.csv文件创建混淆矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆