Numpy 连接很慢:任何替代方法? [英] Numpy concatenate is slow: any alternative approach?

查看:45
本文介绍了Numpy 连接很慢:任何替代方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行以下代码:

for i in range(1000)My_Array=numpy.concatenate((My_Array,New_Rows[i]),axis=0)

上面的代码很慢.有没有更快的方法?

解决方案

这基本上是所有基于数组的算法中发生的事情.

每次更改数组大小时,都需要调整大小,并且需要复制每个元素.这也发生在这里.(一些实现会保留一些空槽;例如,随着每次增长,内部存储器的空间加倍).

  • 如果您在 np.array 创建时获得了数据,只需一次添加所有这些数据(那时内存只会分配一次!)
  • 如果没有,用链表之类的东西收集它们(允许 O(1) 附加操作).然后立即在您的 np.array 中读取它(同样只有一个内存分配).

这不是一个 numpy 特定的话题,而是更多关于数据结构的话题.

由于这个相当模糊的答案得到了一些赞成,我觉得有必要说明我的链表方法是一个可能的例子.正如评论中所指出的,python 的列表更像数组(绝对不是链表).但核心事实是:python 中的 list.append() 是 fast(摊销:O(1)) 而对于 numpy-arrays 则不然!docs:

<块引用><块引用>

列表是如何实现的?

Python 的列表实际上是变长数组,而不是 Lisp 风格的链表.实现使用对其他对象的连续引用数组,并在列表头结构中保留指向该数组的指针和该数组的长度.

这使得索引列表 a[i] 成为一种操作,其成本与列表的大小或索引的值无关.

当追加或插入项目时,引用数组的大小会被调整.巧妙地提高了重复追加项的性能;当数组必须增长时,会分配一些额外的空间,因此接下来的几次不需要实际调整大小.

(我加粗注释)

I am running the following code:

for i in range(1000)
    My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)

The above code is slow. Is there any faster approach?

解决方案

This is basically what is happening in all algorithms based on arrays.

Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).

  • If you got your data at np.array creation-time, just add these all at once (memory will allocated only once then!)
  • If not, collect them with something like a linked list (allowing O(1) appending-operations). Then read it in your np.array at once (again only one memory allocation).

This is not much of a numpy-specific topic, but much more about data-strucures.

Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:

How are lists implemented?

Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.

This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.

(bold annotations by me)

这篇关于Numpy 连接很慢:任何替代方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆