使用Python删除对象列表中的重复项 [英] Remove duplicates in list of object with Python

查看:329
本文介绍了使用Python删除对象列表中的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个对象列表,并且我有一个充满记录的数据库表.我的对象列表具有标题属性,我想从列表中删除所有具有重复标题的对象(保留原始对象).

然后,我要检查对象列表是否与数据库中的任何记录重复,如果有,请在将其添加到数据库之前从列表中删除这些项目.

我已经看到了从这样的列表中删除重复项的解决方案:myList = list(set(myList)),但是我不确定如何使用对象列表来做到这一点?

我也需要维护对象列表的顺序.我也在想,也许我可以使用difflib来检查标题中的差异.

解决方案

set(list_of_objects)仅在知道重复项是什么的情况下才删除重复项,也就是说,您需要定义对象的唯一性. /p>

为此,您需要使对象可哈希化.您需要同时定义__hash____eq__方法,方法如下:

http://docs.python.org/glossary.html#term-hashable

尽管如此,您可能只需要定义__eq__方法.

编辑:如何实现__eq__方法:

正如我所提到的,您需要知道对象的唯一性定义.假设我们有一本属性为author_name和title的书,它们的组合是唯一的(因此,我们可以有很多书由Stephen King撰写,许多书名为The Shining,但只有一本书由Stephen King命名为The Shining),然后实现如下:

def __eq__(self, other):
    return self.author_name==other.author_name\
           and self.title==other.title

类似地,这就是我有时实现__hash__方法的方式:

def __hash__(self):
    return hash(('title', self.title,
                 'author_name', self.author_name))

您可以检查是否创建了两本具有相同作者和书名的书籍的列表,这些书籍对象将相同(使用is运算符)并且相等(使用==运算符) ).另外,使用set()时,它将删除一本书.

编辑:这是我的一个老答案,但是我现在才注意到它的错误已在最后一段中用删除线进行了纠正:具有相同hash()的对象不会与is相比,给出True.但是,如果打算将对象用作集合的元素或用作字典中的键,则使用对象的哈希性.

I've got a list of objects and I've got a db table full of records. My list of objects has a title attribute and I want to remove any objects with duplicate titles from the list (leaving the original).

Then I want to check if my list of objects has any duplicates of any records in the database and if so, remove those items from list before adding them to the database.

I have seen solutions for removing duplicates from a list like this: myList = list(set(myList)), but i'm not sure how to do that with a list of objects?

I need to maintain the order of my list of objects too. I was also thinking maybe I could use difflib to check for differences in the titles.

解决方案

The set(list_of_objects) will only remove the duplicates if you know what a duplicate is, that is, you'll need to define a uniqueness of an object.

In order to do that, you'll need to make the object hashable. You need to define both __hash__ and __eq__ method, here is how:

http://docs.python.org/glossary.html#term-hashable

Though, you'll probably only need to define __eq__ method.

EDIT: How to implement the __eq__ method:

You'll need to know, as I mentioned, the uniqueness definition of your object. Supposed we have a Book with attributes author_name and title that their combination is unique, (so, we can have many books Stephen King authored, and many books named The Shining, but only one book named The Shining by Stephen King), then the implementation is as follows:

def __eq__(self, other):
    return self.author_name==other.author_name\
           and self.title==other.title

Similarly, this is how I sometimes implement the __hash__ method:

def __hash__(self):
    return hash(('title', self.title,
                 'author_name', self.author_name))

You can check that if you create a list of 2 books with same author and title, the book objects will be the same (with is operator) and equal (with == operator). Also, when set() is used, it will remove one book.

EDIT: This is one old anwser of mine, but I only now notice that it has the error which is corrected with strikethrough in the last paragraph: objects with the same hash() won't give True when compared with is. Hashability of object is used, however, if you intend to use them as elements of set, or as keys in dictionary.

这篇关于使用Python删除对象列表中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆