发出两个CSV文件之间的计算差异 [英] Issue computing difference between two csv files
问题描述
我试图获取两个csv文件A.csv和B.csv之间的差异,以便获得添加到第二个文件中的新行.A.csv具有以下数据.
I'm trying to obtain the difference between two csv files A.csv and B.csv in order to obtain new rows added in the second file. A.csv has the following data.
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
B.csv具有以下数据.
B.csv has the following data.
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
要使用添加到输出文件中的新行,我使用以下脚本.
To write the new rows added into an output file I'm using the following script.
input_file1 = "A.csv"
input_file2 = "B.csv"
output_path = "out.csv"
with open(input_file1, 'r') as t1:
fileone = set(t1)
with open(input_file2, 'r') as t2, open(output_path, 'w') as outFile:
for line in t2:
if line not in fileone:
outFile.write(line)
预期输出为:
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
通过上述脚本获得的输出为:
Output obtained through the above script is :
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
我不确定我在哪里出错,尝试调试它,但是没有任何进展.
I'm not sure where I'm making a mistake, tried debugging it but with no progress.
推荐答案
在尾随换行符时要小心.因此,比较之前删除换行,然后在编写时重新添加换行是更安全的:
You need to be careful with trailing newlines. As such it is safer to remove the newlines before comparing and then add them back when writing:
input_file1 = "A.csv"
input_file2 = "B.csv"
output_path = "out.csv"
with open(input_file1, 'r') as t1:
fileone = set(t1.read().splitlines())
with open(input_file2, 'r') as t2, open(output_path, 'w') as outFile:
for line in t2:
line = line.strip()
if line not in fileone:
outFile.write(line + '\n')
这篇关于发出两个CSV文件之间的计算差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!