Python list.clear()时间和空间复杂度? [英] Python list.clear() time and space complexity?

查看:227
本文介绍了Python list.clear()时间和空间复杂度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python list.clear()方法上写博客,其中我还想提及底层算法的时间和空间复杂性。我希望时间复杂度为 O(N),遍历元素并释放内存?但是,我发现了文章,其中提到这实际上是 O(1)操作。然后,我在CPython实现中搜索了该方法的源代码,发现了一个我认为是 list.clear()的实际内部实现的方法,但是,我不是真的可以。这是该方法的源代码:

  static int 
_list_clear(PyListObject * a)
{
Py_ssize_t i;
PyObject ** item = a-> ob_item;
if(item!= NULL){
/ *因为XDECREF可以递归调用
此列表上的操作,因此我们首先将其设为空。 * /
i = Py_SIZE(a);
Py_SIZE(a)= 0;
a-> ob_item = NULL;
a->已分配= 0;
while(--i> = 0){
Py_XDECREF(item [i]);
}
PyMem_FREE(item);
}
/ *永不失败;返回值可以忽略。
注意,此时不能保证列表实际上是空的
,因为XDECREF可能再次填充了它! * /
返回0;
}

我可能是错的,但看起来确实像 O(N) 。另外,我在此处找到了类似的问题,但是那里没有明确的答案。只想确认 list.clear()的实际时间和空间复杂度,也许还需要一些解释来支持答案。任何帮助表示赞赏。谢谢。

解决方案

您正确地注意到,列表的 CPython 实现.clear 为O(n)。 代码在元素上进行迭代以减少每个参考引用计数,没有避免的方法。毫无疑问,这是一个O(n)操作,给定足够大的列表,您可以将 clear()中花费的时间作为列表大小的函数进行度量:

 导入时间

,尺寸为1_000_000、10_000_000、100_000_000, 1_000_000_000:
l = [无] *大小
t0 = time.time()
l.clear()
t1 = time.time()
print(size ,t1-t0)

输出显示线性复杂度;在使用Python 3.7的系统上,它会打印以下内容:

  1000000 0.0023756027221679688 
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791

每个元素的时间当然很小,因为循环是用C和每次迭代所做的工作很少。但是,正如上述测量结果所示,即使是很小的每个元素因素最终也会加起来。较小的每个元素常量并不是忽略操作成本的原因,或者对循环可以移动 l.insert(0,...)中的列表元素,这也非常有效-而且很少有人会声称在开始时插入O(1)。 (而且 clear 可能还会执行更多,因为decref将为引用计数实际上为零的对象运行任意析构函数链。)



从哲学上讲,人们可能会认为在评估复杂性时应忽略内存管理的成本,因为否则将不可能确定性地分析任何内容,因为任何操作都可能触发GC。这种说法是有道理的。 GC确实偶尔出现且无法预测,并且其成本可以视为在所有分配中摊销。类似地,复杂度分析倾向于忽略 malloc 的复杂度,因为它所依赖的参数(如内存碎片)通常与分配大小甚至与数量没有直接关系。已分配的块数。但是,在 list.clear 的情况下,只有一个分配的块,没有触发GC,并且代码仍在访问每个列表元素。即使假设使用O(1)malloc和摊销O(1)GC, list.clear still 所花费的时间与



该问题所链接的文章是关于Python语言的,没有提到特定的实现。不使用引用计数的Python实现(例如Jython或PyPy)可能具有真实的O(1) list.clear ,因此对它们来说,本文的主张是正确的。完全正确。因此,在概念上解释Python列表时,说清除列表为O(1)是正确的-毕竟,所有对象引用都位于连续的数组中,并且只释放一次。这就是您的博客文章可能应该讲的重点,这就是链接文章想要说的。过早考虑参考成本可能会使您的读者感到困惑,并给他们关于Python列表的完全错误的想法(例如,他们可以想象它们被实现为链接列表)。



<最后,在某些时候,必须接受内存管理策略确实会改变 some 操作的复杂性。例如,从调用者的角度来看,销毁C ++中的链表是O(n)。用Java或Go丢弃它会是O(1)。而且,从某种意义上说,垃圾收集语言并不是将相同的工作推迟到以后-很有可能移动的收集器将仅遍历可到达的对象,并且实际上将永远不会访问被丢弃的链表的元素。参考计数使丢弃大型容器在算法上类似于手动收集,GC可以将其删除。虽然CPython的 list.clear 必须触摸每个元素以避免内存泄漏,但PyPy的垃圾收集器从不很可能需要做任何事情排序,因此具有真实的O(1) list.clear


I am writing a blogpost on Python list.clear() method where I also want to mention about the time and space complexity of the underlying algorithm. I expected the time complexity to be O(N), iterate over the elements and free the memory? But, I found an article where it is mentioned that it is actually an O(1) operation. Then, I searched the source code of the method in CPython implementation and found a method which I believe is the actual internal implementation of list.clear(), however, I am not really sure it is. Here's the source code of the method:

static int
_list_clear(PyListObject *a)
{
    Py_ssize_t i;
    PyObject **item = a->ob_item;
    if (item != NULL) {
         /* Because XDECREF can recursively invoke operations on
           this list, we make it empty first. */
        i = Py_SIZE(a);
        Py_SIZE(a) = 0;
        a->ob_item = NULL;
        a->allocated = 0;
        while (--i >= 0) {
           Py_XDECREF(item[i]);
        }
        PyMem_FREE(item);
    }
    /* Never fails; the return value can be ignored.
       Note that there is no guarantee that the list is actually empty
       at this point, because XDECREF may have populated it again! */
    return 0;
}

I could be wrong but it does look like O(N) to me. Also, I found a similar question here, but there's no clear answer there. Just want to confirm the actual time and space complexity of list.clear(), and maybe a little explanation supporting the answer. Any help appreciated. Thanks.

解决方案

As you correctly noticed, the CPython implementation of list.clear is O(n). The code iterates over the elements in order to decrease the reference count of each one, without a way to avoid it. There is no doubt that it is an O(n) operation and, given a large enough list, you can measure the time spent in clear() as function of list size:

import time

for size in 1_000_000, 10_000_000, 100_000_000, 1_000_000_000:
    l = [None] * size
    t0 = time.time()
    l.clear()
    t1 = time.time()
    print(size, t1 - t0)

The output shows linear complexity; on my system with Python 3.7 it prints the following:

1000000 0.0023756027221679688
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791

The time per element is of course tiny because the loop is coded in C and each iteration does very little work. But, as the above measurement shows, even a tiny per-element factor eventually adds up. Small per-element constant is not the reason to ignore the cost of an operation, or the same would apply to the loop that shifts the list elements in l.insert(0, ...), which is also very efficient - and yet few would claim insertion at the beginning to be O(1). (And clear potentially does more work because a decref will run an arbitrary chain of destructors for an object whose reference count actually reaches zero.)

On a philosophical level, one could argue that costs of memory management should be ignored when assessing complexity because otherwise it would be impossible to analyze anything with certainty, as any operation could trigger a GC. This argument has merit; GC does come occasionally and unpredictably, and its cost can be considered amortized across all allocations. In a similar vein complexity analysis tends to ignore the complexity of malloc because the parameters it depends on (like memory fragmentation) are typically not directly related to allocation size or even to the number of already allocated blocks. However, in case of list.clear there is only one allocated block, no GC is triggered, and the code is still visiting each and every list element. Even with the assumption of O(1) malloc and amortized O(1) GC, list.clear still takes the time proportional to the number of elements in the list.

The article linked from the question is about Python the language and doesn't mention a particular implementation. Python implementations that don't use reference counting, such as Jython or PyPy, are likely to have true O(1) list.clear, and for them the claim from the article would be entirely correct. So, when explaining the Python list on a conceptual level, it is not wrong to say that clearing the list is O(1) - after all, all the object references are in a contiguous array, and you free it only once. This is the point your blog post probably should make, and that is what the linked article is trying to say. Taking the cost of reference counting into account too early might confuse your readers and give them completely wrong ideas about Python's lists (e.g. they could imagine that they are implemented as linked lists).

Finally, at some point one must accept that memory management strategy does change complexity of some operations. For example, destroying a linked list in C++ is O(n) from the perspective of the caller; discarding it in Java or Go would be O(1). And not in the trivial sense of a garbage-collected language is just postponing the same work for later - it is quite possible that a moving collector will only traverse reachable objects and will indeed never visit the elements of the discarded linked list. Reference counting makes discarding large containers algorithmically similar to manual collection, and GC can remove that. While CPython's list.clear has to touch every element to avoid a memory leak, it is quite possible that PyPy's garbage collector never needs to do anything of the sort, and thus has a true O(1) list.clear.

这篇关于Python list.clear()时间和空间复杂度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆