一再追加到大名单(Python的2.6.6) [英] Repeatedly appending to a large list (Python 2.6.6)

查看:136
本文介绍了一再追加到大名单(Python的2.6.6)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个项目,我在ASCII值通过串口读取从微控制器(看起来像这样:AA FF BA 11 43 CF等)
输入被快速进来(38两个字符集/秒)。
我采取这一输入,并将其追加到所有的测量运行列表。

约5小时后,我的名单已发展到约855000项。

我。据了解,名单变得越大,速度越慢列表操作变得。我的意图是已经此测试运行24小时,这应该产生围绕3M结果

有没有追加到一个列表,然后list.append更高效,更快捷的方式()?

谢谢大家。


解决方案

  

我。据了解,名单变得越大,速度越慢列表操作变得


这不是真的一般。在Python列表是,尽管名称,而不是链表,但阵列。有迹象表明,上阵列(复制和搜索,例如)O(n)的操作,但似乎并没有使用任何这些。作为一个经验法则:如果它的广泛使用和习惯,一些聪明的人去,选择了一个聪明的办法做到这一点。 list.append 是一种广泛使用的内置(和底层的C函数也用在其他地方,例如列表COM prehensions)。如果有一个更快的方法,将已被使用。

当你检查源$ C ​​$ C ,列表被overallocating,即当它们被调整大小,它们分配多于所需的一个项目,以便在未来n个项可以而不需要另一调整大小被追加(其为O(n))。增长不是恒定的,它与列表大小成正比,因此随着列表变得更大尺寸调整变得罕见。下面是从片段listobject.c:list_resize 确定的过度分配:

  / *这种过度分配比例,以列表的大小,使得房
 *额外的增长。超额配发是轻度的,但
 *足以让线性时间的摊销行为在很长
 *附加​​的序列()在低性能的presence
 *系统的realloc()。
 *增长模式为:0,4,8,16,25,35,46,58,72,88,...
 * /
new_allocated =(newsize>→3)+(newsize&10 9 3:6);

正如马克赎金指出,旧的Python版本(小于2.7,3.0)有使GC破坏这个错误。如果你有这样一个Python版本,您可能要禁用GC。如果你不能因为你产生太多的垃圾(即引用计数牍),你的运气虽则。

I have a project where I am reading in ASCII values from a microcontroller through a serial port (looks like this : AA FF BA 11 43 CF etc) The input is coming in quickly (38 two character sets / second). I'm taking this input and appending it to a running list of all measurements.

After about 5 hours, my list has grown to ~ 855000 entries.

I'm given to understand that the larger a list becomes, the slower list operations become. My intent is to have this test run for 24 hours, which should yield around 3M results.

Is there a more efficient, faster way to append to a list then list.append()?

Thanks Everyone.

解决方案

I'm given to understand that the larger a list becomes, the slower list operations become.

That's not true in general. Lists in Python are, despite the name, not linked lists but arrays. There are operations that are O(n) on arrays (copying and searching, for instance), but you don't seem to use any of these. As a rule of thumb: If it's widely used and idiomatic, some smart people went and chose a smart way to do it. list.append is a widely-used builtin (and the underlying C function is also used in other places, e.g. list comprehensions). If there was a faster way, it would already be in use.

As you will see when you inspect the source code, lists are overallocating, i.e. when they are resized, they allocate more than needed for one item so the next n items can be appended without need to another resize (which is O(n)). The growth isn't constant, it is proportional with the list size, so resizing becomes rarer as the list grows larger. Here's the snippet from listobject.c:list_resize that determines the overallocation:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

As Mark Ransom points out, older Python versions (<2.7, 3.0) have a bug that make the GC sabotage this. If you have such a Python version, you may want to disable the gc. If you can't because you generate too much garbage (that slips refcounting), you're out of luck though.

这篇关于一再追加到大名单(Python的2.6.6)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆