Python:从集合中检索项目 [英] Python: Retrieve items from a set

查看:55
本文介绍了Python:从集合中检索项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一般来说,Python 集合似乎不是为通过键检索项目而设计的.这显然是字典的用途.但是无论如何,给定一个键,您是否可以从等于该键的集合中检索一个实例?

In general, Python sets don't seem to be designed for retrieving items by key. That's obviously what dictionaries are for. But is there anyway that, given a key, you can retrieve an instance from a set which is equal to the key?

再说一次,我知道这正是字典的用途,但据我所知,有正当理由想要用集合来做到这一点.假设您定义了一个类,例如:

Again, I know this is exactly what dictionaries are for, but as far as I can see, there are legitimate reasons to want to do this with a set. Suppose you have a class defined something like:

class Person:
   def __init__(self, firstname, lastname, age):
      self.firstname = firstname
      self.lastname = lastname
      self.age = age

现在,假设我要创建大量 Person 对象,并且每次创建 Person 对象时,我都需要确保它不是一个前一个 Person 对象的副本.如果 Person 具有相同的 firstname,则无论其他实例变量如何,它们都被视为与另一个 Person 重复.所以自然而然地要做的是将所有 Person 对象插入到一个集合中,并定义一个 __hash____eq__ 方法,以便 Person 对象通过它们的 firstname 进行比较.

Now, suppose I am going to be creating a large number of Person objects, and each time I create a Person object I need to make sure it is not a duplicate of a previous Person object. A Person is considered a duplicate of another Person if they have the same firstname, regardless of other instance variables. So naturally the obvious thing to do is insert all Person objects into a set, and define a __hash__ and __eq__ method so that Person objects are compared by their firstname.

另一种选择是创建一个 Person 对象的字典,并使用单独创建的 firstname 字符串作为键.这里的缺点是我会复制 firstname 字符串.在大多数情况下,这并不是真正的问题,但是如果我有 10,000,000 个 Person 对象怎么办?就内存使用而言,冗余字符串存储可能真的开始增加了.

An alternate option would be to create a dictionary of Person objects, and use a separately created firstname string as the key. The drawback here is that I'd be duplicating the firstname string. This isn't really a problem in most cases, but what if I have 10,000,000 Person objects? The redundant string storage could really start adding up in terms of memory usage.

但是如果两个 Person 对象比较相等,我需要能够检索原始对象,以便可以合并其他实例变量(除了 firstname)业务逻辑所需的一种方式.这让我回到我的问题:我需要某种方法来从 set 中检索实例.

But if two Person objects compare equally, I need to be able to retrieve the original object so that the additional instance variables (aside from firstname) can be merged in a way required by the business logic. Which brings me back to my problem: I need some way to retrieve instances from a set.

有没有办法做到这一点?还是使用字典是唯一真正的选择?

Is there anyway to do this? Or is using a dictionary the only real option here?

推荐答案

我肯定会在这里使用字典.重用 firstname 实例变量作为字典键不会复制它——字典只会使用相同的对象.我怀疑字典会比字典使用更多的内存.

I'd definitely use a dictionary here. Reusing the firstname instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.

要真正节省内存,请向您的类添加 __slots__ 属性.这将防止你们 10,000,000 个实例中的每一个都拥有 __dict__ 属性,这将比 dictset 的潜在开销节省更多的内存>.

To actually save memory, add a __slots__ attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__ attribute, which will save much more memory than the potential overhead of a dict over a set.

编辑:一些数字来支持我的主张.我定义了一个存储随机字符串对的愚蠢示例类:

Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:

def rand_str():
    return str.join("", (chr(random.randrange(97, 123))
                         for i in range(random.randrange(3, 16))))

class A(object):
    def __init__(self):
        self.x = rand_str()
        self.y = rand_str()
    def __hash__(self):
        return hash(self.x)
    def __eq__(self, other):
        return self.x == other.x

此类的 1,000,000 个实例所使用的内存量

The amount of memory used by a set of 1,000,000 instances of this class

random.seed(42)
s = set(A() for i in xrange(1000000))

在我的机器上有 240 MB.如果我添加

is on my machine 240 MB. If I add

    __slots__ = ("x", "y")

对于全班,这减少到 112 MB.如果我将相同的数据存储在字典中

to the class, this goes down to 112 MB. If I store the same data in a dictionary

def key_value():
    a = A()
    return a.x, a

random.seed(42)
d = dict(key_value() for i in xrange(1000000))

这使用 249 MB 不带 __slots__ 和 121 MB 带 __slots__.

this uses 249 MB without __slots__ and 121 MB with __slots__.

这篇关于Python:从集合中检索项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆