将计算值存储在对象中 [英] Storing calculated values in an object

查看:57
本文介绍了将计算值存储在对象中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我一直在写一堆这样的代码:

class A:
  def __init__(self, x):
    self.x = x
    self._y = None

  def y(self):
    if self._y is None:
      self._y = big_scary_function(self.x)

    return self._y

  def z(self, i):
    return nice_easy_function(self.y(), i)

在给定的类中,我可能有许多事情像y这样工作,并且我可能还有其他事情使用存储的预先计算的值.这是做事的最佳方法,还是您会推荐一些不同的东西?

请注意,由于您可能在不使用y的情况下使用了A的实例,因此此处未进行预先计算.

我已经用Python编写了示例代码,但是如果相关的话,我会对特定于其他语言的答案感兴趣.相反,我想听听Pythonistas的有关他们是否认为此代码是Pythonic的信息.

解决方案

第一件事:这是Python中非常常见的模式(在Django IIRC中甚至还有一个cached_property描述符类).

这就是说,这里至少存在两个潜在问题.

第一个是所有缓存属性"实现的共同点,并且是这样的事实,即通常不希望属性访问触发某些繁重的计算.是否真的是一个问题取决于上下文(以及读者的近乎宗教信仰的观点...)

第二个问题-更具体地针对您的示例-是传统的缓存失效/状态一致性问题:在这里,您将y作为x的函数-或至少这是我们期望的-但是重新绑定不会相应地更新y.在这种情况下,可以通过将x也设置为属性并在设置器上使_y无效来轻松解决此问题,但是随后您将进行更多的意外计算.

在这种情况下(并取决于上下文和计算成本),我可能会保留备忘录(带有无效性),但提供一个更明确的getter来表明我们可能正在进行一些计算.

我误读了您的代码,并想到了y上的属性装饰器-该装饰器显示了此模式的普遍性;).但是,当自称pythonista"发布支持计算属性的答案时,我的话仍然特别有意义.

如果您想要一个或多或少的通用带有缓存失效的缓存属性",这是一个可能的实现(可能需要更多测试等):

class cached_property(object):
    """
    Descriptor that converts a method with a single self argument 
    into a property cached on the instance.

    It also has a hook to allow for another property setter to
    invalidated the cache, cf the `Square` class below for
    an example.
    """
    def __init__(self, func):
        self.func = func
        self.__doc__ = getattr(func, '__doc__')
        self.name = self.encode_name(func.__name__)

    def __get__(self, instance, type=None):
        if instance is None:
            return self
        if self.name not in instance.__dict__:
            instance.__dict__[self.name] = self.func(instance)
        return instance.__dict__[self.name]

    def __set__(self, instance, value):
        raise AttributeError("attribute is read-only")

    @classmethod
    def encode_name(cls, name):
        return "_p_cached_{}".format(name)

    @classmethod
    def clear_cached(cls, instance, *names):
        for name in names:
            cached = cls.encode_name(name)
            if cached in instance.__dict__:
                del instance.__dict__[cached]

    @classmethod
    def invalidate(cls, *names):
        def _invalidate(setter):
            def _setter(instance, value):
                cls.clear_cached(instance, *names)
                return setter(instance, value)
            _setter.__name__ = setter.__name__
            _setter.__doc__ =  getattr(setter, '__doc__')
            return _setter
        return _invalidate



class Square(object):
    def __init__(self, size):
        self._size = size

    @cached_property
    def area(self):
        return self.size * self.size

    @property
    def size(self):
        return self._size

    @size.setter
    @cached_property.invalidate("area")
    def size(self, size):
        self._size = size

并不是说我认为增加的认知开销实际上是值得的-大多数情况下,简单的内联实现通常使代码更易于理解和维护(并且不需要更多的LOC)-但它仍然可能有用如果程序包需要大量的缓存属性和缓存失效.

Recently I've been writing a bunch of code like this:

class A:
  def __init__(self, x):
    self.x = x
    self._y = None

  def y(self):
    if self._y is None:
      self._y = big_scary_function(self.x)

    return self._y

  def z(self, i):
    return nice_easy_function(self.y(), i)

In a given class I may have a number of things working like this y, and I may have other things that use the stored pre-calculated values. Is this the best way to do things or would you recommend something different?

Note that I don't pre-calculate here because you might use an instance of A without making use of y.

I've written the sample code in Python, but I'd be interested in answers specific to other languages if relevant. Conversely I'd like to hear from Pythonistas about whether they feel this code is Pythonic or not.

解决方案

First thing: this is a very common pattern in Python (there's even a cached_property descriptor class somewhere - in Django IIRC).

This being said there are at least two potential issues here.

The first one is common to all 'cached properties' implementations and is the fact that one usually doesn't expect an attribute access to trigger some heavy computation. Whether it's really an issue depends on the context (and near-religious opinions of the reader...)

The second issue - more specific to your example - is the traditional cache invalidation / state consistency problem: Here you have y as a function of x - or at least that's what one would expect - but rebinding x will not update y accordingly. This can be easily solved in this case by making x a property too and invalidating _y on the setter, but then you have even more unexpected heavy computation happening.

In this case (and depending on the context and computation cost) I'd probably keep memoization (with invalidation) but provide a more explicit getter to make clear we might have some computation going on.

Edit: I misread your code and imagined a property decorator on y - which shows how common this pattern is ;). But my remarks still make sense specially when a "self proclaimed pythonista" posts an answer in favour of a computed attribute.

Edit: if you want a more or less generic "cached property with cache invalidation", here's a possible implementation (might need more testing etc):

class cached_property(object):
    """
    Descriptor that converts a method with a single self argument 
    into a property cached on the instance.

    It also has a hook to allow for another property setter to
    invalidated the cache, cf the `Square` class below for
    an example.
    """
    def __init__(self, func):
        self.func = func
        self.__doc__ = getattr(func, '__doc__')
        self.name = self.encode_name(func.__name__)

    def __get__(self, instance, type=None):
        if instance is None:
            return self
        if self.name not in instance.__dict__:
            instance.__dict__[self.name] = self.func(instance)
        return instance.__dict__[self.name]

    def __set__(self, instance, value):
        raise AttributeError("attribute is read-only")

    @classmethod
    def encode_name(cls, name):
        return "_p_cached_{}".format(name)

    @classmethod
    def clear_cached(cls, instance, *names):
        for name in names:
            cached = cls.encode_name(name)
            if cached in instance.__dict__:
                del instance.__dict__[cached]

    @classmethod
    def invalidate(cls, *names):
        def _invalidate(setter):
            def _setter(instance, value):
                cls.clear_cached(instance, *names)
                return setter(instance, value)
            _setter.__name__ = setter.__name__
            _setter.__doc__ =  getattr(setter, '__doc__')
            return _setter
        return _invalidate



class Square(object):
    def __init__(self, size):
        self._size = size

    @cached_property
    def area(self):
        return self.size * self.size

    @property
    def size(self):
        return self._size

    @size.setter
    @cached_property.invalidate("area")
    def size(self, size):
        self._size = size

Not that I think the added cognitive overhead is worth the price actually - most often than not a plain inline implementation makes the code easier to understand and maintain (and doesn't require much more LOCs) - but it still might be useful if a package requires a lot of cached properties and cache invalidation.

这篇关于将计算值存储在对象中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆