以内存友好的方式将字段添加到结构化ndarray中-无需复制数据? [英] Memory-friendly way to add a field to a structured ndarray — without duplicating data?
问题描述
要将字段添加到结构化的numpy数组中,创建具有新dtype的新数组,复制旧字段并添加新字段非常简单.但是,我需要对占用大量内存的数组执行此操作,而我宁愿不复制所有内容.我自己的实现和numpy.lib.recfunctions.append_fields
的(慢速)实现都重复内存.
To add a field to a structured numpy array, it is quite simply to create a new array with a new dtype, copy over the old fields, and add the new field. However, I need to do this for an array that takes a lot of memory, and I would rather not duplicate all of it. Both my own implementation and the (slow) implementation in numpy.lib.recfunctions.append_fields
duplicate memory.
有没有一种方法可以在不复制内存的情况下将字段添加到结构化的ndarray
中?这意味着要避免创建新的ndarray
或 还是要创建指向与旧数据相同的新ndarray
的方法?
Is there a way to add a field to a structured ndarray
, without duplicating memory? That means, either a way that avoids creating a new ndarray
, or a way to create a new ndarray
that points to the same data as the old?
重复RAM的解决方案:
Solutions that do duplicate RAM:
有一个类似的问题,其中的挑战是删除而不是添加字段.该解决方案使用一个视图,该视图应适用于原始数据的一个子集,但我不确定是否想添加字段时可以对其进行修改.
There is a similar question where the challenge is to remove, not add, fields. The solution uses a view, which should work for a subset of the original data, but I'm not sure if it can be amended when I rather want to add fields.
推荐答案
结构化数组与常规数组一样,作为连续的字节缓冲区存储,前一个记录在后.因此,记录有点像多维数组的最后一个维度.您不能在没有通过串联创建新数组的情况下将列添加到2d数组中.
A structured array is stored, like a regular one, as a contiguous buffer of bytes, one record following the previous. The records are, thus, a bit like the last dimension of a multidimensional array. You can't add a column to a 2d array without making a new array via concatenation.
将一个字段(例如I4
dtype)添加到长度为20个字节的dtype上,这意味着将记录(元素)长度更改为24,即每20个字节向缓冲区中添加4个字节. numpy
如果不创建新的数据缓冲区并从旧的(和新的)副本中复制值,则无法做到这一点.
Adding a field, say I4
dtype to dtype that is, say, 20 bytes long, means changing the record (element) length to 24, i.e. adding 4 bytes to the buffer every 20th byte. numpy
can't do that without making a new data buffer and copying values from the old (and the new).
实际上,即使我们正在谈论向阵列添加新记录,即串联一个新阵列,它仍然需要创建一个新的数据缓冲区.数组是固定大小的.
Actually even if we were talking about adding a new record to the array, i.e. concatenating on a new array, it would still require creating a new data buffer. Arrays are fixed sized.
结构化数组中的字段与列表或字典中的对象不同.您不能仅通过添加指向内存中其他位置的对象的指针来添加字段.
Fields in a structured array are not like objects in a list or a dictionary. You can't add a field by just adding a pointer to an object elsewhere in memory.
也许您应该使用字典,而item
是一个数组.然后,您可以自由添加密钥/项目,而无需复制现有密钥/项目.但是,通过行"进行访问将会很慢.
Maybe you should be using a dictionary, with item
being an array. Then you can freely add a key/item without copying the existing ones. But then access by 'rows' will be slow.
这篇关于以内存友好的方式将字段添加到结构化ndarray中-无需复制数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!