Numpy级联速度很慢:是否有其他替代方法? [英] Numpy concatenate is slow: any alternative approach?

查看:78
本文介绍了Numpy级联速度很慢:是否有其他替代方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行以下代码:

for i in range(1000)
    My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)

上面的代码很慢.有没有更快的方法?

The above code is slow. Is there any faster approach?

推荐答案

这基本上是所有基于数组的算法中发生的事情.

This is basically what is happening in all algorithms based on arrays.

每次更改数组的大小时,都需要调整其大小,并且需要复制每个元素.这也在这里发生. (某些实现会保留一些空的插槽;例如,内部存储空间每次增长都会加倍).

Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).

  • 如果在创建np.array时获得了数据,则只需一次添加所有这些内容(内存将仅分配一次!)
  • 如果不是,请使用链接列表之类的东西来收集它们(允许进行O(1)附加操作).然后一次在np.array中读取它(同样只有一次内存分配).

这不是特定于numpy的主题,而是有关数据结构的更多信息.

This is not much of a numpy-specific topic, but much more about data-strucures.

编辑:由于这个模糊的答案引起了一些反对,我觉得有必要弄清楚我的链表方法是一个可能的例子.如评论中所示,python的列表更像数组(并且绝对不是链表).但实际上,事实是:python中的list.append()是 fast (分期付款:O( 1))对于numpy-arrays而言并非如此! docs :

as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:

列表是如何实现的?

How are lists implemented?

Python的列表实际上是可变长度的数组,而不是Lisp样式的链接列表. 实现使用对其他对象的连续引用数组,并在列表头结构中保留此数组的指针和数组的长度.

Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.

这使索引列表a [i]的操作的成本与列表的大小或索引的值无关.

This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.

在添加或插入项目时,将调整引用数组的大小. 运用一些技巧来提高重复添加项的性能;当必须增长数组时,会分配一些额外的空间,因此接下来的几次不需要实际调整大小.

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.

(我的粗体注释)

这篇关于Numpy级联速度很慢:是否有其他替代方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆