为什么开始比切片慢 [英] Why is startswith slower than slicing

查看:104
本文介绍了为什么开始比切片慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么 startwith 的实现要慢于切片?

Why is the implementation of startwith slower than slicing?

In [1]: x = 'foobar'

In [2]: y = 'foo'

In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop

In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop

令人惊讶的是,即使包括长度计算在内,切片仍然明显更快:

Surprisingly, even including calculation for the length, slicing still appears significantly faster:

In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop

注意: Python for Data中记录了此行为的第一部分分析 (第3章),但没有提供解释.

.

如果有帮助:这是startswith的C代码;这是 dis.dis 的输出:

If helpful: here is the C code for startswith; and here is the output of dis.dis:

In [6]: import dis

In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))

In [8]: dis_it('x[:3]==y')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_CONST               0 (3)
              6 SLICE+2             
              7 LOAD_NAME                1 (y)
             10 COMPARE_OP               2 (==)
             13 RETURN_VALUE        

In [9]: dis_it('x.startswith(y)')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_ATTR                1 (startswith)
              6 LOAD_NAME                2 (y)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE 

推荐答案

有些的差异可以通过考虑.运算符执行其操作所需的时间来解释:

Some of the performance difference can be explained by taking into account the time it takes the . operator to do its thing:

>>> x = 'foobar'
>>> y = 'foo'
>>> sw = x.startswith
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 316 ns per loop
>>> %timeit sw(y)
1000000 loops, best of 3: 267 ns per loop
>>> %timeit x[:3] == y
10000000 loops, best of 3: 151 ns per loop

差异的另一部分可以解释为startswith函数,甚至无操作函数调用也需要一些时间:

Another portion of the difference can be explained by the fact that startswith is a function, and even no-op function calls take a bit of time:

>>> def f():
...     pass
... 
>>> %timeit f()
10000000 loops, best of 3: 105 ns per loop

这不能完全解释它们的区别,因为使用切片和len的版本会调用一个函数,并且速度仍然更快(与上述sw(y)相比-267 ns):

This does not totally explain the difference, since the version using slicing and len calls a function and is still faster (compare to sw(y) above -- 267 ns):

>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 213 ns per loop

我唯一的猜测是,Python可能会为内置函数优化查找时间,或者len调用已进行了大幅优化(这可能是事实).可以使用自定义的len函数对它进行测试.也许这就是 LastCoder 标识的差异所在.请注意startswith实际上对于更长的字符串更快.上面的整个推理过程仅适用于我所讨论的开销实际上很重要的情况.

My only guess here is that maybe Python optimizes lookup time for built-in functions, or that len calls are heavily optimized (which is probably true). It might be possible to test that with a custom len func. Or possibly this is where the differences identified by LastCoder kick in. Note also larsmans' results, which indicate that startswith is actually faster for longer strings. The whole line of reasoning above applies only to those cases where the overhead I'm talking about actually matters.

这篇关于为什么开始比切片慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆