循环通过和比较两个不等长度字典的行 [英] Looping through and comparing lines of two unequal length dictionaries

查看:214
本文介绍了循环通过和比较两个不等长度字典的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个不等长的文件,每个文件包含一列名称。我想使用模糊模糊比较这些名称和识别匹配。但是,使用下面的脚本,而不是将file1中的name列中的所有值与file2中的name列中的所有值进行比较,它只比较file1的第一行和file2的所有行。有人可以请帮助一个脚本做所有的成对比较吗?谢谢!

I have 2 files of unequal length that each contain a column of names. I would like to use fuzzywuzzy to compare these names and identify matches. However using the script below instead of comparing all the values in the name column in file1 to all the values in the name column in file2 it only compares the first line of file1 to all the lines of file2. Can someone please help with a script to do all pairwise comparisons? Thanks!

from fuzzywuzzy import fuzz
from  fuzzywuzzy import process
import csv

file1_loc = 'file1.csv'
file2_loc = 'file2.csv'

file1 = csv.DictReader(open(file1_loc, 'rb'), delimiter=',', quotechar='"')
file2 = csv.DictReader(open(file2_loc, 'rb'), delimiter=',', quotechar='"')

for line in file1:
    for line2 in files2: 
        partial_ratio = fuzz.partial_ratio(str(line['NAME']), str(line2['PNODENAME'])) 
        if partial_ratio > 60:
            bus_name.append(line['NAME'])
            pnode_name.append(line2['PNODENAME'])
            score_50_plus.append(partial_ratio)
            print partial_ratio 
            print line['NAME']
            print line2['PNODENAME']

em> Edit 为了澄清我有一个218名称的列表,我将调用list1和1172名称的列表,我将调用list2。我认为list1中的名称对应于list2中的一些名称,但它们不是完全匹配,所以我不能做一个大致像:

Edit To clarify I have a list of 218 names I'll call list1 and a list of 1172 names I'll call list2. I think that the names in list1 correspond to some of the names in list2 but they aren't exact matches so I can't do something roughly like:

matches = []
    for line in list1:
        if line in list2:
            matches.append(line)

相反,我想在list1中的每个名称的fuzz.partial_ratio到list2中的每个名称。例如:

Instead I'd like to get the fuzz.partial_ratio of each name in list1 to each name in list2 . Something like :

for line in list1:
    partial_ratio = fuzz.partial_ratio(line, list2[0]
for line in list1:
    partial_ratio = fuzz.partial_ratio(line, list2[1]
for line in list1:
    partial_ratio = fuzz.partial_ratio(line, list[2])

无需编写1172循环(如果我反转则为218)。

Without having to write 1172 for loops ( or 218 if I reversed it).

推荐答案

真正的问题是 files2 的迭代器被第一次迭代通过 files1 。为 files1 的每次迭代重建迭代器,修复问题。

The real issue was that the iterator for files2 was consumed by the first iteration through files1. Recreating the iterator for each iteration of files1 fixed the issue.

因为你想做一个逐行比较,嵌套循环不会有帮助 - 两个迭代器必须一起移动我们可以使用 izip itertools 为此目的:

Since you want to do a line-by-line comparison, nesting the loops will not help - the two iterators have to "move together". We can use izip from itertools for this purpose:

from itertools import izip_longest
for l1, l2 in izip_longest(file1, file2):
    if all((l1, l2)):
        partial_ratio = fuzz.partial_ratio(str(l1['NAME']), str(l2['PNODENAME']))

您将替换循环你上面的 izip 结构。

这篇关于循环通过和比较两个不等长度字典的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆