Python:从集合中检索项目 [英] Python: Retrieve items from a set

查看：55 发布时间：2021/7/23 19:14:49 python python-3.x set

本文介绍了Python:从集合中检索项目的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

一般来说，Python 集合似乎不是为通过键检索项目而设计的.这显然是字典的用途.但是无论如何，给定一个键，您是否可以从等于该键的集合中检索一个实例?

In general, Python sets don't seem to be designed for retrieving items by key. That's obviously what dictionaries are for. But is there anyway that, given a key, you can retrieve an instance from a set which is equal to the key?

再说一次，我知道这正是字典的用途，但据我所知，有正当理由想要用集合来做到这一点.假设您定义了一个类，例如:

Again, I know this is exactly what dictionaries are for, but as far as I can see, there are legitimate reasons to want to do this with a set. Suppose you have a class defined something like:

class Person:
   def __init__(self, firstname, lastname, age):
      self.firstname = firstname
      self.lastname = lastname
      self.age = age

现在，假设我要创建大量 Person 对象，并且每次创建 Person 对象时，我都需要确保它不是一个前一个 Person 对象的副本.如果 Person 具有相同的 firstname，则无论其他实例变量如何，它们都被视为与另一个 Person 重复.所以自然而然地要做的是将所有 Person 对象插入到一个集合中，并定义一个 __hash__ 和 __eq__ 方法，以便 Person 对象通过它们的 firstname 进行比较.

Now, suppose I am going to be creating a large number of Person objects, and each time I create a Person object I need to make sure it is not a duplicate of a previous Person object. A Person is considered a duplicate of another Person if they have the same firstname, regardless of other instance variables. So naturally the obvious thing to do is insert all Person objects into a set, and define a __hash__ and __eq__ method so that Person objects are compared by their firstname.

另一种选择是创建一个 Person 对象的字典，并使用单独创建的 firstname 字符串作为键.这里的缺点是我会复制 firstname 字符串.在大多数情况下，这并不是真正的问题，但是如果我有 10,000,000 个 Person 对象怎么办?就内存使用而言，冗余字符串存储可能真的开始增加了.

An alternate option would be to create a dictionary of Person objects, and use a separately created firstname string as the key. The drawback here is that I'd be duplicating the firstname string. This isn't really a problem in most cases, but what if I have 10,000,000 Person objects? The redundant string storage could really start adding up in terms of memory usage.

但是如果两个 Person 对象比较相等，我需要能够检索原始对象，以便可以合并其他实例变量(除了 firstname)业务逻辑所需的一种方式.这让我回到我的问题:我需要某种方法来从 set 中检索实例.

But if two Person objects compare equally, I need to be able to retrieve the original object so that the additional instance variables (aside from firstname) can be merged in a way required by the business logic. Which brings me back to my problem: I need some way to retrieve instances from a set.

有没有办法做到这一点?还是使用字典是唯一真正的选择?

Is there anyway to do this? Or is using a dictionary the only real option here?

推荐答案

我肯定会在这里使用字典.重用 firstname 实例变量作为字典键不会复制它——字典只会使用相同的对象.我怀疑字典会比字典使用更多的内存.

I'd definitely use a dictionary here. Reusing the firstname instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.

要真正节省内存，请向您的类添加 __slots__ 属性.这将防止你们 10,000,000 个实例中的每一个都拥有 __dict__ 属性，这将比 dict 对 set 的潜在开销节省更多的内存>.

To actually save memory, add a __slots__ attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__ attribute, which will save much more memory than the potential overhead of a dict over a set.

编辑:一些数字来支持我的主张.我定义了一个存储随机字符串对的愚蠢示例类:

Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:

def rand_str():
    return str.join("", (chr(random.randrange(97, 123))
                         for i in range(random.randrange(3, 16))))

class A(object):
    def __init__(self):
        self.x = rand_str()
        self.y = rand_str()
    def __hash__(self):
        return hash(self.x)
    def __eq__(self, other):
        return self.x == other.x

此类的 1,000,000 个实例所使用的内存量

The amount of memory used by a set of 1,000,000 instances of this class

random.seed(42)
s = set(A() for i in xrange(1000000))

在我的机器上有 240 MB.如果我添加

is on my machine 240 MB. If I add

    __slots__ = ("x", "y")

对于全班，这减少到 112 MB.如果我将相同的数据存储在字典中

to the class, this goes down to 112 MB. If I store the same data in a dictionary

def key_value():
    a = A()
    return a.x, a

random.seed(42)
d = dict(key_value() for i in xrange(1000000))

这使用 249 MB 不带 __slots__ 和 121 MB 带 __slots__.

this uses 249 MB without __slots__ and 121 MB with __slots__.

这篇关于Python:从集合中检索项目的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python:从集合中检索项目 [英] Python: Retrieve items from a set

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python:从集合中检索项目 [英] Python: Retrieve items from a set

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭