Python:匹配两个 csv 文件之间的值 [英] Python: Match values between two csv files

查看:58
本文介绍了Python:匹配两个 csv 文件之间的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析两个不同的 csv 文件,需要在它们之间匹配一列.目前,当我运行代码段时,它不返回匹配的值,而实际上,两个 csv 文件之间存在匹配的地址.我遇到的问题是 OnlineData csv 文件中地址字段的缩写.例如:

I am parsing through two different csv files and need to match a column between them. Currently, when I run the snippet, it returns no matching values when, in reality, there are matching Addresses between the two csv files. The problem I am having is abbreviations with the addresses field in the OnlineData csv file. For example:

In the Addresses csv                             In the OnlineData csv
  4587 Newton Road                                    4587 Newton Rd
  7854 Food Court                                     7854 Food Ct

如何告诉 Python 在查找时查找两个 csv 文件中的数字 ('4587') 和第一个单词 ('Newton')匹配值.

How can I tell Python to look up only the numbers ('4587') and the first word ('Newton') in both the csv files when looking for matching values.

import csv


Addresses = set()

with open ('Addresses.csv') as f:
    for row in csv.reader(f):
        Addresses.add(row[1])

OnlineData = set()

with open ('C:/Users/OnlineData.csv') as g:
    for row in csv.reader(g):
        PermitData.add(row[1])


results = Addresses & OnlineData


print 'There are', len(results), 'matching addresses between the two csv files'

for result in sorted(results):
    print result

推荐答案

既然你只对匹配数据的部分感兴趣,你不妨把那部分加载到 set 中,然后执行十字路口.

Since you are only interested in matching portions of the data, you might as well just load that portion into the set and then perform the intersection.

import csv

Addresses = set()
with open ('Addresses.csv') as f:
    for row in csv.reader(f):
        portion = ' '.join(row[1].split()[:-1])  # Loads "4587 Newton" instead of "4587 Newton Road"
        Addresses.add(portion)

OnlineData = set()
with open ('C:/Users/OnlineData.csv') as g:
    for row in csv.reader(g):
        portion = ' '.join(row[1].split()[:-1])
        OnlineData.add(portion)

results = Addresses & OnlineData

print 'There are', len(results), 'matching addresses between the two csv files'

for result in sorted(results):
    print result

明显的缺点是您丢失了仍然可以检索的那部分信息.另一种选择是标准化输入,这意味着您可以将 Rd 替换为 Road 并将 Ct 替换为 Court那些出现,以便始终匹配信息.

The obvious disadvantage is that you lose that bit of information, which you could still retrieve. Another option would be to normalize the input, meaning that you could replace Rd with Road and Ct with Court wherever those appear, so as to have always matching info.

这篇关于Python:匹配两个 csv 文件之间的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆