参数可以非常大时进行备注 [英] Memoization when arguments can be very large

查看:141
本文介绍了参数可以非常大时进行备注的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个透明的功能。很容易记住它; 例如

  def memoize(obj):
memo = {}
@ functools.wraps(obj)
def memoizer(* args,** kwargs):
combined_args = args + kwd_mark,)+ tuple(sorted(kwargs.items()))
如果combine_args不在备注中:
memo [combine_args] = obj(* args,** kwargs)
return cache [

$ m
$ b $ my $($,$,$)



$ >

现在假设数据参数 my_function 是巨大的说,这是一个具有数百万元素的 frozenset 。在这种情况下,记录费用是禁止的:每次我们都必须计算 hash(data)作为字典查找的一部分。



我可以使 memo 字典成为数据的属性,而不是内部的对象 memoize 装饰器。这样,我可以在执行缓存查找时完全跳过数据参数,因为另一个巨大的 frozenset 将成为同样可以忽略不计。但是,这种方法最终污染了传递给 my_function 的参数。更糟糕的是,如果我有两个或多个大的参数,这根本不会有帮助(我只能将备注附加到一个参数)。



还有什么可以做的吗?

解决方案

嗯,你可以使用哈希没有恐惧一个frozenset的哈希不是被Python计算不止一次 - 只要它被创建 - 检查时间:

 >> > timeit(frozenset(a),a = range(100))
3.26825213432312
>>> timeit(hash(a),a = frozenset(range(100)))
0.08160710334777832
>>>> timeit((lambda x:x)(a),a = hash(frozenset(range(100)))
0.1994171142578125

不要忘记Python的哈希内置函数调用对象的 __ hash __ 方法,它的创建时定义了返回值用于内置的可烘烤对象。上面你可以看到,调用身份lambda函数比调用hash(a)要慢两倍以上

所以,如果所有的参数都是可以的,只需添加它们创建combined_args时的哈希 - 否则,只需编写其创建,以便您可以使用带有条件的frozenset(也可能是其他)类型的哈希。


Let's say I have a referentially transparent function. It is very easy to memoize it; for example:

def memoize(obj):
  memo = {}
  @functools.wraps(obj)
  def memoizer(*args, **kwargs):
    combined_args = args + (kwd_mark,) + tuple(sorted(kwargs.items()))
    if combined_args not in memo:
      memo[combined_args] = obj(*args, **kwargs)
    return cache[combined_args]
  return memoizer

@memoize
def my_function(data, alpha, beta):
  # ...

Now suppose that the data argument to my_function is huge; say, it's a frozenset with millions of elements. In this case, the cost of memoization is prohibitive: every time, we'd have to calculate hash(data) as part of the dictionary lookup.

I can make the memo dictionary an attribute to data instead of an object inside memoize decorator. This way I can skip the data argument entirely when doing the cache lookup since the chance that another huge frozenset will be the same is negligible. However, this approach ends up polluting an argument passed to my_function. Worse, if I have two or more large arguments, this won't help at all (I can only attach memo to one argument).

Is there anything else that can be done?

解决方案

Well, you can use "hash" there with no fears. A frozenset's hash is not calculated more than once by Python - just when it is created - check the timings:

>>> timeit("frozenset(a)", "a=range(100)")
3.26825213432312
>>> timeit("hash(a)", "a=frozenset(range(100))")
0.08160710334777832
>>> timeit("(lambda x:x)(a)", "a=hash(frozenset(range(100)))")
0.1994171142578125

Don't forget Python's "hash" builtin calls the object's __hash__ method, which has its return value defined at creation time for built-in hasheable objects. Above you can see that calling a identity lambda function is more than twice slower than calling "hash (a)"

So, if all your arguments are hasheable, just add their hash when creating "combined_args" - else, just write its creation so that you use hash for frozenset (and maybe other) types, with a conditional.

这篇关于参数可以非常大时进行备注的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆