为什么显式调用魔术方法比“加糖"慢?句法? [英] Why are explicit calls to magic methods slower than "sugared" syntax?

查看:43
本文介绍了为什么显式调用魔术方法比“加糖"慢?句法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我遇到一组看起来很奇怪的计时结果时,我正在处理一个需要可散列、可比较和快速的小型自定义数据对象.这个对象的一些比较(和散列方法)只是委托给一个属性,所以我使用了类似的东西:

def __hash__(self):返回 self.foo.__hash__()

但是经过测试,我发现 hash(self.foo) 明显更快.出于好奇,我测试了 __eq____ne__ 和其他神奇的比较,结果发现如果我使用含糖形式,所有 都运行得更快(==!=< 等).为什么是这样?我认为加糖的形式必须在幕后进行相同的函数调用,但也许事实并非如此?

时间结果

设置:围绕控制所有比较的实例属性的薄包装.

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 位 (AMD64)] on win32输入帮助"、版权"、信用"或许可"以获取更多信息.>>>导入时间>>>>>>糖设置 = '''\... 导入日期时间...类瘦(对象):... def __init__(self, f):... self._foo = f... def __hash__(self):...返回哈希(self._foo)... def __eq__(self, other):...返回 self._foo == other._foo... def __ne__(self, other):...返回 self._foo != other._foo... def __lt__(self, other):...返回 self._foo <其他._foo... def __gt__(self, other):...返回 self._foo >其他._foo...'''>>>显式设置 = '''\... 导入日期时间...类瘦(对象):... def __init__(self, f):... self._foo = f... def __hash__(self):...返回 self._foo.__hash__()... def __eq__(self, other):...返回 self._foo.__eq__(other._foo)... def __ne__(self, other):...返回 self._foo.__ne__(other._foo)... def __lt__(self, other):...返回 self._foo.__lt__(other._foo)... def __gt__(self, other):...返回 self._foo.__gt__(other._foo)...'''

测试

我的自定义对象包装了一个 datetime,所以这就是我使用的,但它应该没有任何区别.是的,我正在测试中创建日期时间,因此显然那里有一些相关的开销,但是从一个测试到另一个测试的开销是恒定的,所以它不应该有什么不同.为简洁起见,我省略了 __ne____gt__ 测试,但这些结果与此处显示的结果基本相同.

<预><代码>>>>test_hash = '''\...对于范围内的 i (1, 1000):...哈希(薄(日期时间.日期时间.fromordinal(i)))...'''>>>test_eq = '''\...对于范围内的 i (1, 1000):... a = Thin(datetime.datetime.fromordinal(i))... b = Thin(datetime.datetime.fromordinal(i+1))... a == a # 真... a == b # 错误...'''>>>test_lt = '''\...对于范围内的 i (1, 1000):... a = Thin(datetime.datetime.fromordinal(i))... b = Thin(datetime.datetime.fromordinal(i+1))... ...b # 真... b <# 错误...'''

结果

<预><代码>>>>分钟(timeit.repeat(test_hash,explicit_setup,number=1000,repeat=20))1.0805227295846862>>>分钟(timeit.repeat(test_hash,sugar_setup,number=1000,repeat=20))1.0135617737162192>>>min(timeit.repeat(test_eq,explicit_setup,number=1000,repeat=20))2.349765956168767>>>分钟(timeit.repeat(test_eq,sugar_setup,number=1000,repeat=20))2.1486044757355103>>>分钟(timeit.repeat(test_lt,explicit_setup,number=500,repeat=20))1.156479287717275>>>分钟(timeit.repeat(test_lt,sugar_setup,number=500,repeat=20))1.0673696685109917

  • 哈希:
    • 显式: 1.0805227295846862
    • 加糖: 1.0135617737162192
  • 相等:
    • 显式: 2.349765956168767
    • 加糖: 2.1486044757355103
  • 小于:
    • 显式: 1.156479287717275
    • 加糖: 1.0673696685109917

解决方案

两个原因:

  • API 查找仅查看类型.他们不看self.foo.__hash__,他们寻找type(self.foo).__hash__.少了一本字典要查.

  • C 槽查找比纯 Python 属性查找(将使用 __getattribute__)快;而查找方法对象(包括描述符绑定)完全在 C 中完成,绕过 __getattribute__.

所以你必须在本地缓存 type(self._foo).__hash__ 查找,即使那样调用也不会像 C 代码那样快.如果速度非常重要,请坚持使用标准库函数.

避免直接调用魔术方法的另一个原因是,比较操作符更多不仅仅是调用一个魔术方法;这些方法也有反映版本;对于 <代码> x ,如果 x.__lt__ 未定义或 x.__lt__(y) 返回 NotImplemented 单例,y.__gt__(x) 也被咨询.

I was messing around with a small custom data object that needs to be hashable, comparable, and fast, when I ran into an odd-looking set of timing results. Some of the comparisons (and the hashing method) for this object simply delegate to an attribute, so I was using something like:

def __hash__(self):
    return self.foo.__hash__()

However upon testing, I discovered that hash(self.foo) is noticeably faster. Curious, I tested __eq__, __ne__, and the other magic comparisons, only to discover that all of them ran faster if I used the sugary forms (==, !=, <, etc.). Why is this? I assumed the sugared form would have to make the same function call under the hood, but perhaps this isn't the case?

Timeit results

Setups: thin wrappers around an instance attribute that controls all the comparisons.

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> 
>>> sugar_setup = '''\
... import datetime
... class Thin(object):
...     def __init__(self, f):
...             self._foo = f
...     def __hash__(self):
...             return hash(self._foo)
...     def __eq__(self, other):
...             return self._foo == other._foo
...     def __ne__(self, other):
...             return self._foo != other._foo
...     def __lt__(self, other):
...             return self._foo < other._foo
...     def __gt__(self, other):
...             return self._foo > other._foo
... '''
>>> explicit_setup = '''\
... import datetime
... class Thin(object):
...     def __init__(self, f):
...             self._foo = f
...     def __hash__(self):
...             return self._foo.__hash__()
...     def __eq__(self, other):
...             return self._foo.__eq__(other._foo)
...     def __ne__(self, other):
...             return self._foo.__ne__(other._foo)
...     def __lt__(self, other):
...             return self._foo.__lt__(other._foo)
...     def __gt__(self, other):
...             return self._foo.__gt__(other._foo)
... '''

Tests

My custom object is wrapping a datetime, so that's what I used, but it shouldn't make any difference. Yes, I'm creating the datetimes within the tests, so there's obviously some associated overhead there, but that overhead is constant from one test to another so it shouldn't make a difference. I've omitted the __ne__ and __gt__ tests for brevity, but those results were essentially identical to the ones shown here.

>>> test_hash = '''\
... for i in range(1, 1000):
...     hash(Thin(datetime.datetime.fromordinal(i)))
... '''
>>> test_eq = '''\
... for i in range(1, 1000):
...     a = Thin(datetime.datetime.fromordinal(i))
...     b = Thin(datetime.datetime.fromordinal(i+1))
...     a == a # True
...     a == b # False
... '''
>>> test_lt = '''\
... for i in range(1, 1000):
...     a = Thin(datetime.datetime.fromordinal(i))
...     b = Thin(datetime.datetime.fromordinal(i+1))
...     a < b # True
...     b < a # False
... '''

Results

>>> min(timeit.repeat(test_hash, explicit_setup, number=1000, repeat=20))
1.0805227295846862
>>> min(timeit.repeat(test_hash, sugar_setup, number=1000, repeat=20))
1.0135617737162192
>>> min(timeit.repeat(test_eq, explicit_setup, number=1000, repeat=20))
2.349765956168767
>>> min(timeit.repeat(test_eq, sugar_setup, number=1000, repeat=20))
2.1486044757355103
>>> min(timeit.repeat(test_lt, explicit_setup, number=500, repeat=20))
1.156479287717275
>>> min(timeit.repeat(test_lt, sugar_setup, number=500, repeat=20))
1.0673696685109917

  • Hash:
    • Explicit: 1.0805227295846862
    • Sugared: 1.0135617737162192
  • Equal:
    • Explicit: 2.349765956168767
    • Sugared: 2.1486044757355103
  • Less Than:
    • Explicit: 1.156479287717275
    • Sugared: 1.0673696685109917

解决方案

Two reasons:

  • The API lookups look at the type only. They don't look at self.foo.__hash__, they look for type(self.foo).__hash__. That's one less dictionary to look in.

  • The C slot lookup is faster than the pure-Python attribute lookup (which will use __getattribute__); instead looking up the method objects (including the descriptor binding) is done entirely in C, bypassing __getattribute__.

So you'd have to cache the type(self._foo).__hash__ lookup locally, and even then the call would not be as fast as from C code. Just stick to the standard library functions if speed is at a premium.

Another reason to avoid calling the magic methods directly is that the comparison operators do more than just call one magic method; the methods have reflected versions too; for x < y, if x.__lt__ isn't defined or x.__lt__(y) returns the NotImplemented singleton, y.__gt__(x) is consulted as well.

这篇关于为什么显式调用魔术方法比“加糖"慢?句法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆