Python:匹配两个 csv 文件之间的值 [英] Python: Match values between two csv files
问题描述
我正在解析两个不同的 csv 文件,需要在它们之间匹配一列.目前,当我运行代码段时,它不返回匹配的值,而实际上,两个 csv 文件之间存在匹配的地址.我遇到的问题是 OnlineData csv 文件中地址字段的缩写.例如:
I am parsing through two different csv files and need to match a column between them. Currently, when I run the snippet, it returns no matching values when, in reality, there are matching Addresses between the two csv files. The problem I am having is abbreviations with the addresses field in the OnlineData csv file. For example:
In the Addresses csv In the OnlineData csv
4587 Newton Road 4587 Newton Rd
7854 Food Court 7854 Food Ct
如何告诉 Python 在查找时仅查找两个 csv 文件中的数字 ('4587') 和第一个单词 ('Newton')匹配值.
How can I tell Python to look up only the numbers ('4587') and the first word ('Newton') in both the csv files when looking for matching values.
import csv
Addresses = set()
with open ('Addresses.csv') as f:
for row in csv.reader(f):
Addresses.add(row[1])
OnlineData = set()
with open ('C:/Users/OnlineData.csv') as g:
for row in csv.reader(g):
PermitData.add(row[1])
results = Addresses & OnlineData
print 'There are', len(results), 'matching addresses between the two csv files'
for result in sorted(results):
print result
推荐答案
既然你只对匹配数据的部分感兴趣,你不妨把那部分加载到 set
中,然后执行十字路口.
Since you are only interested in matching portions of the data, you might as well just load that portion into the set
and then perform the intersection.
import csv
Addresses = set()
with open ('Addresses.csv') as f:
for row in csv.reader(f):
portion = ' '.join(row[1].split()[:-1]) # Loads "4587 Newton" instead of "4587 Newton Road"
Addresses.add(portion)
OnlineData = set()
with open ('C:/Users/OnlineData.csv') as g:
for row in csv.reader(g):
portion = ' '.join(row[1].split()[:-1])
OnlineData.add(portion)
results = Addresses & OnlineData
print 'There are', len(results), 'matching addresses between the two csv files'
for result in sorted(results):
print result
明显的缺点是您丢失了仍然可以检索的那部分信息.另一种选择是标准化输入,这意味着您可以将 Rd
替换为 Road
并将 Ct
替换为 Court
那些出现,以便始终匹配信息.
The obvious disadvantage is that you lose that bit of information, which you could still retrieve. Another option would be to normalize the input, meaning that you could replace Rd
with Road
and Ct
with Court
wherever those appear, so as to have always matching info.
这篇关于Python:匹配两个 csv 文件之间的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!