比较2个单独的csv文件和写差异到一个新的csv文件 - Python 2.7 [英] Compare 2 seperate csv files and write difference to a new csv file - Python 2.7

查看:1755
本文介绍了比较2个单独的csv文件和写差异到一个新的csv文件 - Python 2.7的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图比较python中的两个csv文件,并将差异保存到python 2.7中的第三个csv文件。

I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7.

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

set1 = tuple(oldList1)
set2 = tuple(oldList2)

print oldList2.difference(oldList1)

我得到错误信息:

Traceback (most recent call last):
  File "compare.py", line 21, in <module>
    print oldList2.difference(oldList1)
AttributeError: 'list' object has no attribute 'difference'


b $ b

我是新的python和编码,一般来说,我还没有完成这个代码(我必须确保将差异存储到一个变量,并将差异写入新的csv文件。 )。我一直在试图解决这一整天,我根本不能。您的帮助将非常感谢。

I am new to python, and coding in general, and I am not done with this code just yet (I have to make sure to store the differences to a variable and write the difference to a new csv file.). I have been trying to solve this all day and I simply can't. Your help would be greatly appreciated.

推荐答案

差异是什么意思?

如果在所有列相同时,某一行被认为是相同的,那么您可以得到答案通过以下代码:

If a row is considered same when all columns are same, then you can get your answer via the following code:

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

print [row for row in oldList1 if row not in oldList2]

但是,如果两个行相同,如果某个关键字段(即列)相同,代码将为您提供答案:

However, if two rows are same if a certain key field (i.e. column) is same, then the following code will give you your answer:

import csv

f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

keyfield = 0 # Change this for choosing the column number

oldList2keys = [row[keyfield] for row in oldList2]
print [row for row in oldList1 if row[keyfield] not in oldList2keys]

注意:上述代码对于极大的文件可能运行缓慢。如果相反,您希望通过哈希加快代码,则可以在转换 oldList 之后使用 set 代码:

Note: The above code might run slow for extremely large files. If instead, you wish to speed up code through hashing, you can use set after converting the oldLists using the following code:

set1 = set(tuple(row) for row in oldList1)
set2 = set(tuple(row) for row in oldList2)

之后,您可以使用 set1.difference (set2)

这篇关于比较2个单独的csv文件和写差异到一个新的csv文件 - Python 2.7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆