比较2个单独的csv文件和写差异到一个新的csv文件 - Python 2.7 [英] Compare 2 seperate csv files and write difference to a new csv file - Python 2.7
问题描述
我试图比较python中的两个csv文件,并将差异保存到python 2.7中的第三个csv文件。
I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7.
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
set1 = tuple(oldList1)
set2 = tuple(oldList2)
print oldList2.difference(oldList1)
我得到错误信息:
Traceback (most recent call last):
File "compare.py", line 21, in <module>
print oldList2.difference(oldList1)
AttributeError: 'list' object has no attribute 'difference'
b $ b
我是新的python和编码,一般来说,我还没有完成这个代码(我必须确保将差异存储到一个变量,并将差异写入新的csv文件。 )。我一直在试图解决这一整天,我根本不能。您的帮助将非常感谢。
I am new to python, and coding in general, and I am not done with this code just yet (I have to make sure to store the differences to a variable and write the difference to a new csv file.). I have been trying to solve this all day and I simply can't. Your help would be greatly appreciated.
推荐答案
差异是什么意思?
如果在所有列相同时,某一行被认为是相同的,那么您可以得到答案通过以下代码:
If a row is considered same when all columns are same, then you can get your answer via the following code:
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
print [row for row in oldList1 if row not in oldList2]
但是,如果两个行相同,如果某个关键字段(即列)相同,代码将为您提供答案:
However, if two rows are same if a certain key field (i.e. column) is same, then the following code will give you your answer:
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
keyfield = 0 # Change this for choosing the column number
oldList2keys = [row[keyfield] for row in oldList2]
print [row for row in oldList1 if row[keyfield] not in oldList2keys]
注意:上述代码对于极大的文件可能运行缓慢。如果相反,您希望通过哈希加快代码,则可以在转换 oldList
之后使用 set
代码:
Note: The above code might run slow for extremely large files. If instead, you wish to speed up code through hashing, you can use set
after converting the oldList
s using the following code:
set1 = set(tuple(row) for row in oldList1)
set2 = set(tuple(row) for row in oldList2)
之后,您可以使用 set1.difference (set2)
这篇关于比较2个单独的csv文件和写差异到一个新的csv文件 - Python 2.7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!