以内存友好的方式将字段添加到结构化ndarray中-无需复制数据? [英] Memory-friendly way to add a field to a structured ndarray — without duplicating data?

查看:85
本文介绍了以内存友好的方式将字段添加到结构化ndarray中-无需复制数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要将字段添加到结构化的numpy数组中,创建具有新dtype的新数组,复制旧字段并添加新字段非常简单.但是,我需要对占用大量内存的数组执行此操作,而我宁愿不复制所有内容.我自己的实现和numpy.lib.recfunctions.append_fields的(慢速)实现都重复内存.

To add a field to a structured numpy array, it is quite simply to create a new array with a new dtype, copy over the old fields, and add the new field. However, I need to do this for an array that takes a lot of memory, and I would rather not duplicate all of it. Both my own implementation and the (slow) implementation in numpy.lib.recfunctions.append_fields duplicate memory.

有没有一种方法可以在不复制内存的情况下将字段添加到结构化的ndarray中?这意味着要避免创建新的ndarray 还是要创建指向与旧数据相同的新ndarray的方法?

Is there a way to add a field to a structured ndarray, without duplicating memory? That means, either a way that avoids creating a new ndarray, or a way to create a new ndarray that points to the same data as the old?

重复RAM的解决方案:

Solutions that do duplicate RAM:

将字段添加到结构化numpy数组(2)

将字段添加到结构化numpy数组(3)

有一个类似的问题,其中的挑战是删除而不是添加字段.该解决方案使用一个视图,该视图应适用于原始数据的一个子集,但我不确定是否想添加字段时可以对其进行修改.

There is a similar question where the challenge is to remove, not add, fields. The solution uses a view, which should work for a subset of the original data, but I'm not sure if it can be amended when I rather want to add fields.

推荐答案

结构化数组与常规数组一样,作为连续的字节缓冲区存储,前一个记录在后.因此,记录有点像多维数组的最后一个维度.您不能在没有通过串联创建新数组的情况下将列添加到2d数组中.

A structured array is stored, like a regular one, as a contiguous buffer of bytes, one record following the previous. The records are, thus, a bit like the last dimension of a multidimensional array. You can't add a column to a 2d array without making a new array via concatenation.

将一个字段(例如I4 dtype)添加到长度为20个字节的dtype上,这意味着将记录(元素)长度更改为24,即每20个字节向缓冲区中添加4个字节. numpy如果不创建新的数据缓冲区并从旧的(和新的)副本中复制值,则无法做到这一点.

Adding a field, say I4 dtype to dtype that is, say, 20 bytes long, means changing the record (element) length to 24, i.e. adding 4 bytes to the buffer every 20th byte. numpy can't do that without making a new data buffer and copying values from the old (and the new).

实际上,即使我们正在谈论向阵列添加新记录,即串联一个新阵列,它仍然需要创建一个新的数据缓冲区.数组是固定大小的.

Actually even if we were talking about adding a new record to the array, i.e. concatenating on a new array, it would still require creating a new data buffer. Arrays are fixed sized.

结构化数组中的字段与列表或字典中的对象不同.您不能仅通过添加指向内存中其他位置的对象的指针来添加字段.

Fields in a structured array are not like objects in a list or a dictionary. You can't add a field by just adding a pointer to an object elsewhere in memory.

也许您应该使用字典,而item是一个数组.然后,您可以自由添加密钥/项目,而无需复制现有密钥/项目.但是,通过行"进行访问将会很慢.

Maybe you should be using a dictionary, with item being an array. Then you can freely add a key/item without copying the existing ones. But then access by 'rows' will be slow.

这篇关于以内存友好的方式将字段添加到结构化ndarray中-无需复制数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆