重复追加到一个大列表(Python 2.6.6) [英] Repeatedly appending to a large list (Python 2.6.6)

查看:23
本文介绍了重复追加到一个大列表(Python 2.6.6)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个项目,我正在通过串行端口从微控制器读取 ASCII 值(如下所示:AA FF BA 11 43 CF 等)输入很快(38 个两个字符集/秒).我正在接受此输入并将其附加到所有测量的运行列表中.

I have a project where I am reading in ASCII values from a microcontroller through a serial port (looks like this : AA FF BA 11 43 CF etc) The input is coming in quickly (38 two character sets / second). I'm taking this input and appending it to a running list of all measurements.

大约 5 个小时后,我的列表已增加到约 855000 个条目.

After about 5 hours, my list has grown to ~ 855000 entries.

我了解到列表越大,列表操作就越慢.我的目的是让这个测试运行 24 小时,这应该会产生大约 300 万个结果.

I'm given to understand that the larger a list becomes, the slower list operations become. My intent is to have this test run for 24 hours, which should yield around 3M results.

是否有比 list.append() 更有效、更快的附加到列表的方法?

Is there a more efficient, faster way to append to a list then list.append()?

谢谢大家.

推荐答案

我了解到,列表越大,列表操作就越慢.

I'm given to understand that the larger a list becomes, the slower list operations become.

通常情况并非如此.尽管名称如此,Python 中的列表并不是链表,而是数组.在数组上有 O(n) 的操作(例如复制和搜索),但您似乎没有使用任何这些操作.作为一个经验法则:如果它被广泛使用和惯用,一些聪明的人就会选择一种聪明的方式来做到这一点.list.append 是一个广泛使用的内置函数(底层 C 函数也用于其他地方,例如列表推导式).如果有更快的方法,它早就被使用了.

That's not true in general. Lists in Python are, despite the name, not linked lists but arrays. There are operations that are O(n) on arrays (copying and searching, for instance), but you don't seem to use any of these. As a rule of thumb: If it's widely used and idiomatic, some smart people went and chose a smart way to do it. list.append is a widely-used builtin (and the underlying C function is also used in other places, e.g. list comprehensions). If there was a faster way, it would already be in use.

正如您在检查源代码时所看到的, 列表是过度分配的,即当它们被调整大小时,它们分配的比一个项目需要的多,因此可以附加下 n 个项目而无需另一个调整大小(即 O(n)).增长不是恒定的,它与列表大小成正比,因此随着列表变大,调整大小变得越来越少.以下是 listobject.c:list_resize 中确定过度分配的片段:

As you will see when you inspect the source code, lists are overallocating, i.e. when they are resized, they allocate more than needed for one item so the next n items can be appended without need to another resize (which is O(n)). The growth isn't constant, it is proportional with the list size, so resizing becomes rarer as the list grows larger. Here's the snippet from listobject.c:list_resize that determines the overallocation:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

正如 Mark Ransom 指出的那样,较旧的 Python 版本(<2.7、3.0)有一个错误,使 GC 破坏了这一点.如果您有这样的 Python 版本,您可能需要禁用 gc.如果你不能因为你产生了太多的垃圾(这会导致 refcounting),那你就倒霉了.

As Mark Ransom points out, older Python versions (<2.7, 3.0) have a bug that make the GC sabotage this. If you have such a Python version, you may want to disable the gc. If you can't because you generate too much garbage (that slips refcounting), you're out of luck though.

这篇关于重复追加到一个大列表(Python 2.6.6)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆