如何按值比较python值中的2个CSV文件并打印差异? [英] How to compare 2 CSV files in python value by value and print the difference?

查看:292
本文介绍了如何按值比较python值中的2个CSV文件并打印差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个相同尺寸的CSV文件.在下面的示例中,尺寸为3 * 3(3个逗号分隔的值和3行).可能是尺寸为100 * 10000的文件

I have 2 CSV files of same dimensions. In the below example used the dimensions is 3*3 (3 comma separated values and 3 rows). It could be files of dimensions 100*10000

File1.csv:

File1.csv:

名称,ID,职业

Tom,1岁,老师

2岁的迪克(Dick),演员

Dick, 2, Actor

File2.csv:

File2.csv:

名称,ID,职业

2岁的迪克(Dick),演员

Dick, 2, Actor

Tom,1,警察

我想对文件元素进行明智的比较(例如:老师==警察)

I want to compare the files element wise (e.g: Teacher == Police)

如果我可以使用主键(ID)比较列表,以防列表不按顺序排序,那将是很棒的.我想输出如下内容:

It would be great if I could compare the lists using primary key (ID) in case the list is not in order. I would like to have output something like below:

ID = 1的行业不匹配,即老师<>警察

ID是主键.

注意:文件可能非常大(100列* 10000条记录)

Note: file may be very huge (100 columns * 10000 records)

下面是我用来从2个csv文件中获取列表A和B的代码.但这非常繁琐,使用这么长的代码我只能得到2行.

Below is the code I used to get the lists A and B from 2 csv files. But it's very tedious and I could get only 2 lines using such long code.

source_file = open('File1.csv', 'r')
file_one_line_1 = source_file.readline()
file_one_line_1_str = str(file_one_line_1)
file_one_line_1_str_replace = file_one_line_1_str.replace('\n', '')
file_one_line_1_list = list(file_one_line_1_str_replace.split(','))
file_one_line_2 = source_file.readline()
file_one_line_2_str = str(file_one_line_2)
file_one_line_2_str_replace = file_one_line_2_str.replace('\n', '')
file_one_line_2_list = list(file_one_line_2_str_replace.split(','))
file_one_line_3 = source_file.readline()
file_one_line_3_str = str(file_one_line_3)
file_one_line_3_str_replace = file_one_line_3_str.replace('\n', '')
file_one_line_3_list = list(file_one_line_3_str_replace.split(','))
A = [file_one_line_1_list, file_one_line_2_list, file_one_line_3_list]


target_file = open('File2.csv', 'r')
file_two_line_1 = target_file.readline()
file_two_line_1_str = str(file_two_line_1)
file_two_line_1_str_replace = file_two_line_1_str.replace('\n', '')
file_two_line_1_list = list(file_two_line_1_str_replace.split(','))
file_two_line_2 = source_file.readline()
file_two_line_2_str = str(file_two_line_2)
file_two_line_2_str_replace = file_two_line_2_str.replace('\n', '')
file_two_line_2_list = list(file_two_line_2_str_replace.split(','))
file_two_line_3 = source_file.readline()
file_two_line_3_str = str(file_two_line_3)
file_two_line_3_str_replace = file_two_line_3_str.replace('\n', '')
file_two_line_3_list = list(file_two_line_3_str_replace.split(','))
B = [file_two_line_1_list, file_two_line_2_list, file_two_line_3_list]

使用下面的代码,它工作流畅:

Used below code and it's working smooth:


source_file = 'Book1.csv'

target_file = 'Book2.csv'

primary_key = 'id'

# read source and target files
with open(source_file, 'r') as f:
    reader = csv.reader(f)
    A = list(reader)
with open(target_file, 'r') as f:
    reader = csv.reader(f)
    B = list(reader)

# get the number of the 'ID' column
column_names = A[0]
column_id = column_names.index(primary_key)

# get the column names without 'ID'
values_name = column_names[0:column_id] + column_names[column_id + 1:]

# create a dictionary with keys in column `column_id`
# and values the list of the other column values
A_dict = {a[column_id]: a[0:column_id] + a[column_id + 1:] for a in A}
B_dict = {b[column_id]: b[0:column_id] + b[column_id + 1:] for b in B}

# iterate on the keys and on the other columns and print the differences
for id in A_dict.keys():
    for column in range(len(column_names) - 1):
        if A_dict[id][column] != B_dict[id][column]:
            print(f"{primary_key} = {id}\t{values_name[column]}: {A_dict[id][column]} != {B_dict[id][column]}")```

Thanks.

推荐答案

有关读取csv并将内容存储为嵌套列表的信息,请参见 https ://stackoverflow.com/a/35340988/12669658

For reading csv and store the content as nested lists, see https://stackoverflow.com/a/35340988/12669658

要逐个比较列表,请参考您的专用问题: https://stackoverflow.com/a/59633822/12669658

For comparing the lists element-wise, refer to your dedicated question: https://stackoverflow.com/a/59633822/12669658

这篇关于如何按值比较python值中的2个CSV文件并打印差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆