为什么切片分配比`list.insert`更快? [英] Why is slice assignment faster than `list.insert`?

查看:100
本文介绍了为什么切片分配比`list.insert`更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

灵感来自这个不错的答案

这是一个基准:

import timeit

def test1():
    a = [1,2,3]
    a.insert(0,1)

def test2():
    a = [1,2,3]
    a[0:0]=[1]

print (timeit.timeit('test1()','from __main__ import test1'))
print (timeit.timeit('test2()','from __main__ import test2'))

对我来说,test2明显更快(〜10%).为什么会这样?我希望它会变慢,因为:

For me, test2 is sligtly faster (~10%). Why is that the case? I would expect it to be slower since:

  1. 切片分配必须能够接受任何长度的可迭代对象,因此必须更笼统.
  2. 在切片分配中,我们需要在右侧创建一个新列表,以使其正常工作.

有人可以帮助我理解吗?

Can anybody help me understand this?

(在OS-X 10.5.8上使用python 2.7)

推荐答案

您的第一个测试用例必须调用列表a上的方法insert,而test2中的所有操作都直接以字节码处理.请注意下面的test1的反汇编中的CALL_FUNCTION.在Python中,调用函数的成本适中:肯定足够昂贵,足以弥补运行时差异的百分之几.

Your first test case has to call the method insert on the list a, whereas all the operations in test2 are handled directly in byte code. Note the CALL_FUNCTION in the disassembly of test1 below. Calling functions is moderately expensive in Python: certainly expensive enough to account for a few percent difference in run time.

>>> import dis
>>> dis.dis(test1)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_FAST                0 (a)
             18 LOAD_ATTR                0 (insert)
             21 LOAD_CONST               4 (0)
             24 LOAD_CONST               1 (1)
             27 CALL_FUNCTION            2
             30 POP_TOP             
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        
>>> dis.dis(test2)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_CONST               1 (1)
             18 BUILD_LIST               1
             21 LOAD_FAST                0 (a)
             24 LOAD_CONST               4 (0)
             27 LOAD_CONST               4 (0)
             30 STORE_SLICE+3       
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        

错误的解释

我首先发布了此内容,但经过考虑后,我认为这是不正确的.我在此描述的差异仅在有大量数据要移动时才有显着差异,而在此测试中情况并非如此.即使有大量数据,差异也只有几个百分点:

Bad explanation

I posted this first, but after consideration I think it is not correct. The difference I describe here should only make a noticeable difference when there is a lot of data to be moved, which isn't the case in the test here. And even with a lot of data the difference is only a couple of percent:

import timeit

def test1():
    a = range(10000000)
    a.insert(1,1)

def test2():
    a = range(10000000)
    a[1:1]=[1]

>>> timeit.timeit(test1, number=10)
6.008707046508789
>>> timeit.timeit(test2, number=10)
5.861173868179321

方法list.insert由功能 listobject.c中的> .您会看到它一一复制了列表尾部的项目引用:

The method list.insert is implemented by the function ins1 in listobject.c. You'll see that it copies the item references for the tail of the list one by one:

for (i = n; --i >= where; )
    items[i+1] = items[i];

另一方面,切片分配是通过功能 list_ass_slice ,它调用memmove:

On the other hand slice assignment is implemented by the function list_ass_slice, which calls memmove:

memmove(&item[ihigh+d], &item[ihigh],
        (k - ihigh)*sizeof(PyObject *));

所以我想您问题的答案是C库函数memmove比简单循环更好地进行了优化.请参见此处对于memmove 的glibc实现:我相信,当从list_ass_slice进行调用时,它最终会最终调用

So I think the answer to your question is that the C library function memmove is better optimized than the simple loop. See here for the glibc implementation of memmove: I believe that when called from list_ass_slice it eventually ends up calling _wordcopy_bwd_aligned which you can see is heavily hand-optimized.

这篇关于为什么切片分配比`list.insert`更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆