Python:从集合中检索项目 [英] Python: Retrieve items from a set
问题描述
一般来说,Python 集合似乎不是为通过键检索项目而设计的.这显然是字典的用途.但是无论如何,给定一个键,您是否可以从等于该键的集合中检索一个实例?
In general, Python sets don't seem to be designed for retrieving items by key. That's obviously what dictionaries are for. But is there anyway that, given a key, you can retrieve an instance from a set which is equal to the key?
再说一次,我知道这正是字典的用途,但据我所知,有正当理由想要用集合来做到这一点.假设您定义了一个类,例如:
Again, I know this is exactly what dictionaries are for, but as far as I can see, there are legitimate reasons to want to do this with a set. Suppose you have a class defined something like:
class Person:
def __init__(self, firstname, lastname, age):
self.firstname = firstname
self.lastname = lastname
self.age = age
现在,假设我要创建大量 Person
对象,并且每次创建 Person
对象时,我都需要确保它不是一个前一个 Person
对象的副本.如果 Person
具有相同的 firstname
,则无论其他实例变量如何,它们都被视为与另一个 Person
重复.所以自然而然地要做的是将所有 Person
对象插入到一个集合中,并定义一个 __hash__
和 __eq__
方法,以便 Person
对象通过它们的 firstname
进行比较.
Now, suppose I am going to be creating a large number of Person
objects, and each time I create a Person
object I need to make sure it is not a duplicate of a previous Person
object. A Person
is considered a duplicate of another Person
if they have the same firstname
, regardless of other instance variables. So naturally the obvious thing to do is insert all Person
objects into a set, and define a __hash__
and __eq__
method so that Person
objects are compared by their firstname
.
另一种选择是创建一个 Person
对象的字典,并使用单独创建的 firstname
字符串作为键.这里的缺点是我会复制 firstname
字符串.在大多数情况下,这并不是真正的问题,但是如果我有 10,000,000 个 Person
对象怎么办?就内存使用而言,冗余字符串存储可能真的开始增加了.
An alternate option would be to create a dictionary of Person
objects, and use a separately created firstname
string as the key. The drawback here is that I'd be duplicating the firstname
string. This isn't really a problem in most cases, but what if I have 10,000,000 Person
objects? The redundant string storage could really start adding up in terms of memory usage.
但是如果两个 Person
对象比较相等,我需要能够检索原始对象,以便可以合并其他实例变量(除了 firstname
)业务逻辑所需的一种方式.这让我回到我的问题:我需要某种方法来从 set
中检索实例.
But if two Person
objects compare equally, I need to be able to retrieve the original object so that the additional instance variables (aside from firstname
) can be merged in a way required by the business logic. Which brings me back to my problem: I need some way to retrieve instances from a set
.
有没有办法做到这一点?还是使用字典是唯一真正的选择?
Is there anyway to do this? Or is using a dictionary the only real option here?
推荐答案
我肯定会在这里使用字典.重用 firstname
实例变量作为字典键不会复制它——字典只会使用相同的对象.我怀疑字典会比字典使用更多的内存.
I'd definitely use a dictionary here. Reusing the firstname
instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.
要真正节省内存,请向您的类添加 __slots__
属性.这将防止你们 10,000,000 个实例中的每一个都拥有 __dict__
属性,这将比 dict
对 set
的潜在开销节省更多的内存>.
To actually save memory, add a __slots__
attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__
attribute, which will save much more memory than the potential overhead of a dict
over a set
.
编辑:一些数字来支持我的主张.我定义了一个存储随机字符串对的愚蠢示例类:
Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:
def rand_str():
return str.join("", (chr(random.randrange(97, 123))
for i in range(random.randrange(3, 16))))
class A(object):
def __init__(self):
self.x = rand_str()
self.y = rand_str()
def __hash__(self):
return hash(self.x)
def __eq__(self, other):
return self.x == other.x
此类的 1,000,000 个实例所使用的内存量
The amount of memory used by a set of 1,000,000 instances of this class
random.seed(42)
s = set(A() for i in xrange(1000000))
在我的机器上有 240 MB.如果我添加
is on my machine 240 MB. If I add
__slots__ = ("x", "y")
对于全班,这减少到 112 MB.如果我将相同的数据存储在字典中
to the class, this goes down to 112 MB. If I store the same data in a dictionary
def key_value():
a = A()
return a.x, a
random.seed(42)
d = dict(key_value() for i in xrange(1000000))
这使用 249 MB 不带 __slots__
和 121 MB 带 __slots__
.
this uses 249 MB without __slots__
and 121 MB with __slots__
.
这篇关于Python:从集合中检索项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!