一再追加到大名单(Python的2.6.6) [英] Repeatedly appending to a large list (Python 2.6.6)
问题描述
我有一个项目,我在ASCII值通过串口读取从微控制器(看起来像这样:AA FF BA 11 43 CF等)
输入被快速进来(38两个字符集/秒)。
我采取这一输入,并将其追加到所有的测量运行列表。
约5小时后,我的名单已发展到约855000项。
我。据了解,名单变得越大,速度越慢列表操作变得。我的意图是已经此测试运行24小时,这应该产生围绕3M结果
有没有追加到一个列表,然后list.append更高效,更快捷的方式()?
谢谢大家。
我。据了解,名单变得越大,速度越慢列表操作变得
块引用>这不是真的一般。在Python列表是,尽管名称,而不是链表,但阵列。有迹象表明,上阵列(复制和搜索,例如)O(n)的操作,但似乎并没有使用任何这些。作为一个经验法则:如果它的广泛使用和习惯,一些聪明的人去,选择了一个聪明的办法做到这一点。
list.append
是一种广泛使用的内置(和底层的C函数也用在其他地方,例如列表COM prehensions)。如果有一个更快的方法,将已被使用。当你检查源$ C $ C ,列表被overallocating,即当它们被调整大小,它们分配多于所需的一个项目,以便在未来n个项可以而不需要另一调整大小被追加(其为O(n))。增长不是恒定的,它与列表大小成正比,因此随着列表变得更大尺寸调整变得罕见。下面是从
片段listobject.c:list_resize
确定的过度分配:/ *这种过度分配比例,以列表的大小,使得房
*额外的增长。超额配发是轻度的,但
*足以让线性时间的摊销行为在很长
*附加的序列()在低性能的presence
*系统的realloc()。
*增长模式为:0,4,8,16,25,35,46,58,72,88,...
* /
new_allocated =(newsize>→3)+(newsize&10 9 3:6);正如马克赎金指出,旧的Python版本(小于2.7,3.0)有使GC破坏这个错误。如果你有这样一个Python版本,您可能要禁用GC。如果你不能因为你产生太多的垃圾(即引用计数牍),你的运气虽则。
I have a project where I am reading in ASCII values from a microcontroller through a serial port (looks like this : AA FF BA 11 43 CF etc) The input is coming in quickly (38 two character sets / second). I'm taking this input and appending it to a running list of all measurements.
After about 5 hours, my list has grown to ~ 855000 entries.
I'm given to understand that the larger a list becomes, the slower list operations become. My intent is to have this test run for 24 hours, which should yield around 3M results.
Is there a more efficient, faster way to append to a list then list.append()?
Thanks Everyone.
解决方案I'm given to understand that the larger a list becomes, the slower list operations become.
That's not true in general. Lists in Python are, despite the name, not linked lists but arrays. There are operations that are O(n) on arrays (copying and searching, for instance), but you don't seem to use any of these. As a rule of thumb: If it's widely used and idiomatic, some smart people went and chose a smart way to do it.
list.append
is a widely-used builtin (and the underlying C function is also used in other places, e.g. list comprehensions). If there was a faster way, it would already be in use.As you will see when you inspect the source code, lists are overallocating, i.e. when they are resized, they allocate more than needed for one item so the next n items can be appended without need to another resize (which is O(n)). The growth isn't constant, it is proportional with the list size, so resizing becomes rarer as the list grows larger. Here's the snippet from
listobject.c:list_resize
that determines the overallocation:/* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... */ new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
As Mark Ransom points out, older Python versions (<2.7, 3.0) have a bug that make the GC sabotage this. If you have such a Python version, you may want to disable the gc. If you can't because you generate too much garbage (that slips refcounting), you're out of luck though.
这篇关于一再追加到大名单(Python的2.6.6)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!