循环通过和比较两个不等长度字典的行 [英] Looping through and comparing lines of two unequal length dictionaries
问题描述
我有2个不等长的文件,每个文件包含一列名称。我想使用模糊模糊比较这些名称和识别匹配。但是,使用下面的脚本,而不是将file1中的name列中的所有值与file2中的name列中的所有值进行比较,它只比较file1的第一行和file2的所有行。有人可以请帮助一个脚本做所有的成对比较吗?谢谢!
I have 2 files of unequal length that each contain a column of names. I would like to use fuzzywuzzy to compare these names and identify matches. However using the script below instead of comparing all the values in the name column in file1 to all the values in the name column in file2 it only compares the first line of file1 to all the lines of file2. Can someone please help with a script to do all pairwise comparisons? Thanks!
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import csv
file1_loc = 'file1.csv'
file2_loc = 'file2.csv'
file1 = csv.DictReader(open(file1_loc, 'rb'), delimiter=',', quotechar='"')
file2 = csv.DictReader(open(file2_loc, 'rb'), delimiter=',', quotechar='"')
for line in file1:
for line2 in files2:
partial_ratio = fuzz.partial_ratio(str(line['NAME']), str(line2['PNODENAME']))
if partial_ratio > 60:
bus_name.append(line['NAME'])
pnode_name.append(line2['PNODENAME'])
score_50_plus.append(partial_ratio)
print partial_ratio
print line['NAME']
print line2['PNODENAME']
em> Edit 为了澄清我有一个218名称的列表,我将调用list1和1172名称的列表,我将调用list2。我认为list1中的名称对应于list2中的一些名称,但它们不是完全匹配,所以我不能做一个大致像:
Edit To clarify I have a list of 218 names I'll call list1 and a list of 1172 names I'll call list2. I think that the names in list1 correspond to some of the names in list2 but they aren't exact matches so I can't do something roughly like:
matches = []
for line in list1:
if line in list2:
matches.append(line)
相反,我想在list1中的每个名称的fuzz.partial_ratio到list2中的每个名称。例如:
Instead I'd like to get the fuzz.partial_ratio of each name in list1 to each name in list2 . Something like :
for line in list1:
partial_ratio = fuzz.partial_ratio(line, list2[0]
for line in list1:
partial_ratio = fuzz.partial_ratio(line, list2[1]
for line in list1:
partial_ratio = fuzz.partial_ratio(line, list[2])
无需编写1172循环(如果我反转则为218)。
Without having to write 1172 for loops ( or 218 if I reversed it).
推荐答案
真正的问题是 files2
的迭代器被第一次迭代通过 files1
。为 files1
的每次迭代重建迭代器,修复问题。
The real issue was that the iterator for files2
was consumed by the first iteration through files1
. Recreating the iterator for each iteration of files1
fixed the issue.
因为你想做一个逐行比较,嵌套循环不会有帮助 - 两个迭代器必须一起移动我们可以使用 izip
从 itertools
为此目的:
Since you want to do a line-by-line comparison, nesting the loops will not help - the two iterators have to "move together". We can use izip
from itertools
for this purpose:
from itertools import izip_longest
for l1, l2 in izip_longest(file1, file2):
if all((l1, l2)):
partial_ratio = fuzz.partial_ratio(str(l1['NAME']), str(l2['PNODENAME']))
您将替换循环你上面的
izip
结构。
这篇关于循环通过和比较两个不等长度字典的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!