为什么开始比切片慢 [英] Why is startswith slower than slicing
问题描述
为什么 startwith
的实现要慢于切片?
Why is the implementation of startwith
slower than slicing?
In [1]: x = 'foobar'
In [2]: y = 'foo'
In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop
In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop
令人惊讶的是,即使包括长度计算在内,切片仍然明显更快:
Surprisingly, even including calculation for the length, slicing still appears significantly faster:
In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop
注意: Python for Data中记录了此行为的第一部分分析 (第3章),但没有提供解释.
.
如果有帮助:这是startswith
的C代码;这是 dis.dis
的输出:
If helpful: here is the C code for startswith
; and here is the output of dis.dis
:
In [6]: import dis
In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))
In [8]: dis_it('x[:3]==y')
1 0 LOAD_NAME 0 (x)
3 LOAD_CONST 0 (3)
6 SLICE+2
7 LOAD_NAME 1 (y)
10 COMPARE_OP 2 (==)
13 RETURN_VALUE
In [9]: dis_it('x.startswith(y)')
1 0 LOAD_NAME 0 (x)
3 LOAD_ATTR 1 (startswith)
6 LOAD_NAME 2 (y)
9 CALL_FUNCTION 1
12 RETURN_VALUE
推荐答案
有些的差异可以通过考虑.
运算符执行其操作所需的时间来解释:
Some of the performance difference can be explained by taking into account the time it takes the .
operator to do its thing:
>>> x = 'foobar'
>>> y = 'foo'
>>> sw = x.startswith
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 316 ns per loop
>>> %timeit sw(y)
1000000 loops, best of 3: 267 ns per loop
>>> %timeit x[:3] == y
10000000 loops, best of 3: 151 ns per loop
差异的另一部分可以解释为startswith
是函数,甚至无操作函数调用也需要一些时间:
Another portion of the difference can be explained by the fact that startswith
is a function, and even no-op function calls take a bit of time:
>>> def f():
... pass
...
>>> %timeit f()
10000000 loops, best of 3: 105 ns per loop
这不能完全解释它们的区别,因为使用切片和len
的版本会调用一个函数,并且速度仍然更快(与上述sw(y)
相比-267 ns):
This does not totally explain the difference, since the version using slicing and len
calls a function and is still faster (compare to sw(y)
above -- 267 ns):
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 213 ns per loop
我唯一的猜测是,Python可能会为内置函数优化查找时间,或者len
调用已进行了大幅优化(这可能是事实).可以使用自定义的len
函数对它进行测试.也许这就是 LastCoder 标识的差异所在.请注意startswith实际上对于更长的字符串更快.上面的整个推理过程仅适用于我所讨论的开销实际上很重要的情况.
My only guess here is that maybe Python optimizes lookup time for built-in functions, or that len
calls are heavily optimized (which is probably true). It might be possible to test that with a custom len
func. Or possibly this is where the differences identified by LastCoder kick in. Note also larsmans' results, which indicate that startswith
is actually faster for longer strings. The whole line of reasoning above applies only to those cases where the overhead I'm talking about actually matters.
这篇关于为什么开始比切片慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!