为什么切片分配比`list.insert`更快? [英] Why is slice assignment faster than `list.insert`?
问题描述
灵感来自这个不错的答案,
这是一个基准:
import timeit
def test1():
a = [1,2,3]
a.insert(0,1)
def test2():
a = [1,2,3]
a[0:0]=[1]
print (timeit.timeit('test1()','from __main__ import test1'))
print (timeit.timeit('test2()','from __main__ import test2'))
对我来说,test2
明显更快(〜10%).为什么会这样?我希望它会变慢,因为:
For me, test2
is sligtly faster (~10%). Why is that the case? I would expect it to be slower since:
- 切片分配必须能够接受任何长度的可迭代对象,因此必须更笼统.
- 在切片分配中,我们需要在右侧创建一个新列表,以使其正常工作.
有人可以帮助我理解吗?
Can anybody help me understand this?
(在OS-X 10.5.8上使用python 2.7)
推荐答案
您的第一个测试用例必须调用列表a
上的方法insert
,而test2
中的所有操作都直接以字节码处理.请注意下面的test1
的反汇编中的CALL_FUNCTION
.在Python中,调用函数的成本适中:肯定足够昂贵,足以弥补运行时差异的百分之几.
Your first test case has to call the method insert
on the list a
, whereas all the operations in test2
are handled directly in byte code. Note the CALL_FUNCTION
in the disassembly of test1
below. Calling functions is moderately expensive in Python: certainly expensive enough to account for a few percent difference in run time.
>>> import dis
>>> dis.dis(test1)
2 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 LOAD_CONST 3 (3)
9 BUILD_LIST 3
12 STORE_FAST 0 (a)
3 15 LOAD_FAST 0 (a)
18 LOAD_ATTR 0 (insert)
21 LOAD_CONST 4 (0)
24 LOAD_CONST 1 (1)
27 CALL_FUNCTION 2
30 POP_TOP
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
>>> dis.dis(test2)
2 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 LOAD_CONST 3 (3)
9 BUILD_LIST 3
12 STORE_FAST 0 (a)
3 15 LOAD_CONST 1 (1)
18 BUILD_LIST 1
21 LOAD_FAST 0 (a)
24 LOAD_CONST 4 (0)
27 LOAD_CONST 4 (0)
30 STORE_SLICE+3
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
错误的解释
我首先发布了此内容,但经过考虑后,我认为这是不正确的.我在此描述的差异仅在有大量数据要移动时才有显着差异,而在此测试中情况并非如此.即使有大量数据,差异也只有几个百分点:
Bad explanation
I posted this first, but after consideration I think it is not correct. The difference I describe here should only make a noticeable difference when there is a lot of data to be moved, which isn't the case in the test here. And even with a lot of data the difference is only a couple of percent:
import timeit
def test1():
a = range(10000000)
a.insert(1,1)
def test2():
a = range(10000000)
a[1:1]=[1]
>>> timeit.timeit(test1, number=10)
6.008707046508789
>>> timeit.timeit(test2, number=10)
5.861173868179321
方法list.insert
由功能
The method list.insert
is implemented by the function ins1
in listobject.c
. You'll see that it copies the item references for the tail of the list one by one:
for (i = n; --i >= where; )
items[i+1] = items[i];
另一方面,切片分配是通过功能 list_ass_slice
,它调用memmove
:
On the other hand slice assignment is implemented by the function list_ass_slice
, which calls memmove
:
memmove(&item[ihigh+d], &item[ihigh],
(k - ihigh)*sizeof(PyObject *));
所以我想您问题的答案是C库函数memmove
比简单循环更好地进行了优化.请参见此处对于memmove
的glibc实现:我相信,当从list_ass_slice
进行调用时,它最终会最终调用
So I think the answer to your question is that the C library function memmove
is better optimized than the simple loop. See here for the glibc implementation of memmove
: I believe that when called from list_ass_slice
it eventually ends up calling _wordcopy_bwd_aligned
which you can see is heavily hand-optimized.
这篇关于为什么切片分配比`list.insert`更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!