Python:从元组的第0个元素中有重复数据的元组列表中查找元组 [英] Python : Find tuples from a list of tuples having duplicate data in the 0th element(of the tuple)

查看:536
本文介绍了Python:从元组的第0个元素中有重复数据的元组列表中查找元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含文件名文件路径的元组列表. 我想找到重复的文件名(但 filepath 可能不同),即,文件名相同但 filepath 可能不同的元组.

I am having a list of tuples containing filename and filepath. I want to find duplicates filename(but filepath may be different) i.e. tuples whose filename is same but filepath may be different.

元组列表示例:

file_info = [('foo1.txt','/home/fold1'), ('foo2.txt','/home/fold2'), ('foo1.txt','/home/fold3')]

我想找到重复的文件名,即file_info [2](在上述情况下)将其打印并删除. 我可能可以反复检查:

I want to find the duplicate filename i.e. file_info[2](in the above case) print it and delete it. I possibly could iteratively check like:

count = 0
for (filename,filepath) in file_info:
    count = count + 1
    for (filename1,filepath1) in file_info[count:]:
        if filename == filename1:
            print filename1,filepath1
            file_info.remove((filename1,filepath1))

但是,有没有一种更有效/更短/更正确/pythonic的方法来完成相同的任务. 谢谢.

But is there a more efficient/shorter/more correct/pythonic way of accomplishing the same task. Thank You.

推荐答案

使用集合可以避免创建双重循环;将尚未出现的项目添加到 new 列表中,以避免更改正在循环的列表(这将导致跳过的项目):

Using a set lets you avoid creating a double loop; add items you haven't seen yet to a new list to avoid altering the list you are looping over (which will lead to skipped items):

seen = set()
keep = []
for filename, filepath in file_info:
    if filename in seen:
        print filename, filepath
    else:
        seen.add(filename)
        keep.append((filename, filepath))
file_info = keep

如果顺序无关紧要并且您没有没有打印要删除的项目,那么另一种方法是使用字典:

If order doesn't matter and you don't have to print the items you removed, then another approach is to use a dictionary:

file_info = dict(reversed(file_info)).items()

反转输入列表可确保保留第一个条目,而不是最后一个.

Reversing the input list assures that the first entry is kept rather than the last.

如果您需要具有重复文件的所有完整路径,我将构建一个字典,将列表作为值,然后删除仅包含一个元素的所有内容:

If you needed all the full paths for files with duplicates, I'd build a dictionary with lists as values, then remove anything that has only one element:

filename_to_paths = {}
for filename, filepath in file_info:
    filename_to_paths.setdefault(filename, []).append(filepath)
duplicates = {filename: paths for filename, paths in filename_to_paths.iteritems() if len(paths) > 1}

duplicates词典现在仅包含在file_info列表中具有多个路径的文件名.

The duplicates dictionary now only contains filenames where you have more than 1 path in the file_info list.

这篇关于Python:从元组的第0个元素中有重复数据的元组列表中查找元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆