基于两个列中的特定数据比较两个CSV文件 [英] Comparing two CSV Files Based on Specific Data in two Columns

查看:217
本文介绍了基于两个列中的特定数据比较两个CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被鼓励离开我的舒适区,使用python与很少或没有经验,现在我被卡住了。我试图比较两个CSV文件(fileA.csv和fileB.csv),并附加任何丢失的用户行到fileA.csv从fileB.csv。我可以比较的唯一字段是用户的名字和姓氏(在这种情况下,它是每个文件的row [0]和row [2])。

I was encouraged to step out of my comfort zone and use python with little to no experience and now I'm stuck. I'm trying to compare two CSV files (fileA.csv and fileB.csv), and append any missing user rows to fileA.csv from fileB.csv. The only fields I can compare with are user's first and last names (in this case, it's row[0] and row[2] from each file).

理解,您不能附加信息到您当前已打开的文件,所以我接受建议,而不必创建第三个文件(如果可能)。下面有我在正确的轨道,但有很多数据,所以我需要一个循环。请帮助。

From my understanding, you cannot append information to a file that you currently have open so I'm open to suggestions without having to create a third file (if possible). Below has me on the right track, but there's a lot of data so I'll need a loop. Please help.

import csv
reader1 = csv.reader(open('fileA', 'rb'), delimiter=',', quotechar='|')
row1 = reader1.next()
reader2 = csv.reader(open('fileB', 'rb'), delimiter=',', quotechar='|')
row2 = reader2.next()


##For Loop...

        if (row1[0] == row2[0]) and (row1[2] == row2[2]):
                ## Compare next 
        else:
                ## Append entire row to fileA.csv






示例FileA.csv:


Example FileA.csv:

John,Thomas,Doe,some,other,stuff
Jane, ,Smith,some,other,stuff

示例FileB.csv:

Example FileB.csv:

John, ,Doe,other,personal,data
Jane,Elizabeth,Smith,other,personal,data
Robin,T,Williams,other,personal,data

应该从FileB附加到FileA的唯一一行是Robin的完整Row,这样FileA看起来像:

The only row that should append from FileB to FileA is Robin's complete Row so that FileA looks like:

DesiredResult_FileA:

DesiredResult_FileA:

John,Thomas,Doe,some,other,stuff
Jane, ,Smith,some,other,stuff
Robin,T,Williams,other,personal,data


推荐答案

将文件A中的信息首先存储在内存中。

Store the information found in file A in memory first, in a set.

然后,重新打开文件A在追加模式,并循环文件B.从B中找不到的任何名称,在集合中,然后可以添加到文件A:

Then, reopen file A in append mode, and loop over file B. Any name from B not found in the set, can then be added to file A:

csv_dialect = dict(delimiter=',', quotechar='|')
names = set()
with open('fileA', 'rb') as file_a:
    reader1 = csv.reader(file_a, **csv_dialect)
    next(reader1)
    for row in reader1:
        names.add((row[0], row[2]))

# `names` is now a set of all names (taken from columns 0 and 2) found in file A.

with open('fileA', 'ab') as file_a, open('fileB', 'rb') as file_b:
    writer = csv.writer(file_a, **csv_dialect)
    reader2 = csv.reader(file_b, **csv_dialect)
    next(reader2)
    for row in reader2:
        if (row[0], row[2]) not in names:
            # This row was not present in file A, add it.
            writer.writerow(row)

code>行需要Python 2.7或更高版本。在早期的Python版本中,简单地嵌入两个语句:

The combined with line requires Python 2.7 or newer. In earlier Python versions, simply nest the two statements:

with open('fileA', 'ab') as file_a:
    with open('fileB', 'rb') as file_b:
        # etc.

这篇关于基于两个列中的特定数据比较两个CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆