具有自定义哈希行为的python对象集 [英] python set of objects with custom hash behavior

查看:58
本文介绍了具有自定义哈希行为的python对象集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用一个集合来管理"myItem"实例的集合.myItem类具有自己的哈希函数.这些项目的散列是基于每个项目中的一些但不是全部数据,为简单起见,在下面的示例中,数据"为字典r.哈希考虑了两个密钥hk1和hk2,并且存在哈希计算中未考虑的第三个密钥"sad".

I would like to use a set to manage a collection of 'myItem' instances. The myItem class has its own hash function. The hash of these items is based on some but not all of the data in each item, for simplicity in the example below the 'data' is dictionary r. The hash takes into account 2 keys, hk1 and hk2, and there is a third key 'sad' that is not considered in the hash calculation.

class myItem():

    def __init__(self, r):
        # r is a dict holding information about the instance
        # of course r has to have certain keys...
        self.r = r

    def __hash__(self):
        """Override the default hash behavior"""
        return hash(tuple(sorted([self.r['hk1'],self.r['hk2']])))

    def __eq__(self,other):
        """checking equality"""
        if isinstance(other, self.__class__):
            return self.__hash__() == other.__hash__()
        return NotImplemented

    def __ne__(self, other):
        """checking inequality"""
        if isinstance(other, self.__class__):
            return not self.__eq__(other)
        return NotImplemented

    def __repr__(self):
        return str(self.r)

预期的行为通过下面的简短单元测试得到确认.

The expected behavior is confirmed by the short unit test below.

class testMySet(unittest.TestCase):

    def testMyItemstuff(self):

        m1 = myItem({'hk1':'val1', 'hk2': 100, 'sad': 'other stuff'})
        m2 = myItem({'hk1': 'val1', 'hk2': 100, 'sad': 'different other stuff'})

        self.assertEqual(m1, m2)
        self.assertNotEqual(m1.r['sad'], m2.r['sad'])

        s = { m1 }
        # add m2 to s
        s.add(m2)
        # same hash, m2 is not added
        self.assertEqual(len(s), 1)
        # set contains the original object, not the last one added
        self.assertNotEqual(s.pop().r['sad'], 'different other stuff')

我的问题是,如何修改行为,以使其哈希与现有对象重合的新对象最终替换原始对象,而对性能的影响却最小?

My question is, how can I modify the behavior such that adding a new object whose hash coincides with an existing one ends up replacing the original one, with minimal performance impact?

推荐答案

是否以这种方式定义哈希对您的应用程序来说确实是您的决定,但这似乎不太可能.

Whether defining your hash that way makes sense for your application is really for you to decide, but it does seem unlikely.

无论如何,我可以想到两个将与集合一样快"的选项-O(1)而不是O(n)-并且它们的速度取决于实现您所描述的哈希函数:

In any case, I can think of two options that will be "as fast as" a set -- O(1) instead of O(n) -- and their speed depends on implementing a hash function as you describe:

首先,整理您的课程并创建实例:

First, boil down your class and create instances:

class Item():
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __hash__(self):
        return hash(self.a)

    def __eq__(self,other):
        if isinstance(other, self.__class__):
            # Ignoring .b attribute
            return self.a == other.a
        else:
            return NotImplemented

    def __repr__(self):
        return "Item(%s, %s)" % (self.a, self.b)

i1 = Item(1,2)
i2 = Item(3,4)
i3 = Item(1,5)


print(i1 == i2)             # False (.a's don't match)
print(i1 == i3)             # True  (.a's match)

方法1:字典值

# Using a dict
updating_set = {}
updating_set[i1] = i1       # .values(): [Item(1, 2)]
updating_set[i2] = i2       # .values(): [Item(1, 2), Item(3, 4)]
updating_set[i3] = i3       # .values(): [Item(1, 5), Item(3, 4)]

print(list(updating_set.values()))
# [Item(1, 5), Item(3, 4)]

方法2:使用设置的子类

# Using a set subclass
class UpdatingSet(set):
    def add(self, item):
        if item in self: super().remove(item)
        super().add(item)

uset = UpdatingSet()
uset.add(i1)                # UpdatingSet({Item(1, 2)})
uset.add(i2)                # UpdatingSet({Item(1, 2), Item(3, 4)})
uset.add(i3)                # UpdatingSet({Item(1, 5), Item(3, 4)})

奖励方法3:不需要特殊的哈希函数

class NewItem():
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __repr__(self):
        return "Item(%s, %s)" % (self.a, self.b)

class ItemSet():
    def __init__(self):
        self.items = {}

    def add(self, item):
        item_hash = item.a
        self.items[item_hash] = item

    def values(self):
        return self.items.values()

i1 = NewItem(1,2)
i2 = NewItem(3,4)
i3 = NewItem(1,5)

iset = ItemSet()
iset.add(i1)                # .values(): [Item(1, 2)]
iset.add(i2)                # .values(): [Item(1, 2), Item(3, 4)]
iset.add(i3)                # .values(): [Item(1, 5), Item(3, 4)]

print(list(iset.values()))  # [Item(1, 5), Item(3, 4)]

第三种方法不需要您实现哈希(这可能会导致意外的副作用,但是可以使用 ItemSet.add()哈希函数"作为字典键.

This third approach doesn't require you to implement hash (which could cause unexpected side effects, but mimics the hashing process inside ItemSet.add(), using the "hash function" as the dictionary key.

这可能是您最好的选择,除非您真的想要实现哈希并知道该决定的影响程度.

This is probably your best bet, unless you really want to implement hash and know the extent of the effects of that decision.

这篇关于具有自定义哈希行为的python对象集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆