在列表列表中查找和更新重复项 [英] find and update duplicates in a list of lists

查看:136
本文介绍了在列表列表中查找和更新重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种Pythonic方式来解决以下问题。我有(我认为是)一个工作的解决方案,但它有复杂的流量控制,只是不是漂亮。 (基本上是一个C ++解决方案)

I am looking for a Pythonic way to solve the following problem. I have (what I think is) a working solution but it has complicated flow controls and just isn't "pretty". (Basically, a C++ solution)

我有一个列表列表。每个列表包含多个不同类型的项目(每个列表可能有10个项目)列表的总体顺序不相关,但是任何单个列表中项目的顺序很重要。 (即我无法更改它)。

I have a list of lists. Each list contains multiple items of varying types (maybe 10 items per list) The overall order of the lists is not relevant, but the order of the items in any individual list is important. (ie I can't change it).

我正在寻找通过在单个列表的末尾添加一个额外的字段来标记重复的。但是,在这种情况下,重复列表是在几个预选字段中具有相同值的列表,但不是所有字段(没有真)重复。

I am looking to "tag" duplicates by adding an extra field to the end of an individual list. However, in this case a "duplicate" list is one that has equal values in several preselected fields, but not all fields (there are no "true" duplicates).

例如:如果这是来自列表的5个项目列表的原始数据,并且重复的定义在第一个和第三个字段中具有相等的值:

For example: if this were the original data from a 5 item list of lists and duplicate is defined as having equal values in the first and third fields:

['apple', 'window', 'pear', 2, 1.55, 'banana']
['apple', 'orange', 'kiwi', 3, 1.80, 'banana']
['apple', 'envelope', 'star_fruit', 2, 1.55, 'banana']
['apple', 'orange', 'pear', 2, 0.80, 'coffee_cup'] 
['apple', 'orange', 'pear', 2, 3.80, 'coffee_cup']

第一个,第四个和第五个列表将是重复的,因此所有列表应更新如下:

The first, fourth and fifth lists would be duplicates and therefore all lists should be updated as follows:

['apple', 'window', 'pear', 2, 1.55, 'banana', 1]
['apple', 'orange', 'kiwi', 3, 1.55, 'banana', 0]
['apple', 'envelope', 'star_fruit', 2, 1.55,'banana', 0]
['apple', 'orange', 'pear', 2, 3.80, 'coffee_cup', 2]  
['apple', 'orange', 'pear', 2, 3.80, 'coffee_cup', 3]

感谢任何帮助或方向。我认为这可能超越了学习Python书。

Thanks for any help or direction. I think this may be getting beyond the Learning Python book.

推荐答案

from collections import defaultdict

lists = [['apple', 'window', 'pear', 2, 1.55, 'banana'],
['apple', 'orange', 'kiwi', 3, 1.80, 'banana'],
['apple', 'envelope', 'star_fruit', 2, 1.55, 'banana'],
['apple', 'orange', 'pear', 2, 0.80, 'coffee_cup'],
['apple', 'orange', 'pear', 2, 3.80, 'coffee_cup']]

dic = defaultdict(int)
fts = []
for lst in lists:
    first_third = lst[0], lst[2]
    dic[first_third] += 1
    if dic[first_third] == 2: fts.append(first_third)
    lst.append(dic[first_third])

for lst in lists:
    if (lst[0], lst[2]) not in fts:
        lst[-1] -= 1

print(lists)

编辑:感谢utdemir。 first_third = lst [0],lst [2] 是正确的,不是 first_third = lst [0] + lst [2]

Thanks utdemir. first_third = lst[0], lst[2] is correct, not first_third = lst[0] + lst[2]

Edit2:为了清楚起见,更改了变量名称。

Changed variable names for clarity.

Edit3:更改为反映原始的海报真的想和他更新的名单。不是很漂亮,只是想要改变。

Changed to reflect what the original poster really wanted, and his updated list. Not pretty any more, desired changes just tacked on.

这篇关于在列表列表中查找和更新重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆