添加和访问numpy结构化数组的对象类型字段 [英] Add and access object-type field of a numpy structured array
问题描述
我正在使用numpy 1.16.2.
I am using numpy 1.16.2.
总之,我想知道如何向结构化数组添加对象类型字段.通过recfunctions
模块的标准方式会引发错误,我想这是有原因的.因此,我想知道我的解决方法是否有问题.此外,我想了解为什么需要这种解决方法,以及在访问新创建的数组时是否需要格外小心.
In brief, I am wondering how to add an object-type field to a structured array. The standard way via the recfunctions
module throws an error and I suppose there is a reason for this. Therefore, I wonder whether there is anything wrong with my workaround. Furthermore, I would like to understand why this workaround is necessary and whether I need to use extra caution when accessing the newly created array.
现在是详细信息:
我有一个numpy结构化数组:
I have a numpy structured array:
import numpy as np
a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','float']})
for i in range(len(a)):
a[i] = i
我想将类型为object
的另一个字段"test"添加到数组a
中.执行此操作的标准方法是使用numpy的recfunctions
模块:
I want to add another field "test" of type object
to the array a
. The standard way for doing this is using numpy's recfunctions
module:
import numpy.lib.recfunctions as rf
b = rf.append_fields(a, "test", [None]*len(a))
此代码引发错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-4a7be4f94686> in <module>
----> 1 rf.append_fields(a, "test", [None]*len(a))
D:\_Programme\Anaconda3\lib\site-packages\numpy\lib\recfunctions.py in append_fields(base, names, data, dtypes, fill_value, usemask, asrecarray)
718 if dtypes is None:
719 data = [np.array(a, copy=False, subok=True) for a in data]
--> 720 data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
721 else:
722 if not isinstance(dtypes, (tuple, list)):
D:\_Programme\Anaconda3\lib\site-packages\numpy\lib\recfunctions.py in <listcomp>(.0)
718 if dtypes is None:
719 data = [np.array(a, copy=False, subok=True) for a in data]
--> 720 data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
721 else:
722 if not isinstance(dtypes, (tuple, list)):
D:\_Programme\Anaconda3\lib\site-packages\numpy\core\_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
虽然问题已经很久了,但我已经在此处讨论了类似的错误.不知道我观察到的行为是否实际上是一个错误. 此处得知不支持包含通用对象的结构化数组的视图.
A similar error has been discussed here, though the issue is old and I do not know whether the behaviour I am observing is actually a bug. Here I am informed that views of structured arrays containing general objects are not supported.
因此,我建立了一种解决方法:
I therefore built a workaround:
b = np.empty(len(a), dtype=a.dtype.descr+[("test", object)])
b[list(a.dtype.names)] = a
这有效.但是,我有以下问题:
This works. Nonetheless, I have the following questions:
问题
- 为什么需要这种解决方法?这只是一个错误吗?
- 使用新数组
b
与使用a
似乎没有什么不同.变量c = b[["A", "test"]]
显然是b
数据的视图.那么,为什么他们会说不支持数组b
上的视图?我必须格外小心对待c
吗?
- Why is this workaround neccesary? Is this just a bug?
- Working with the new array
b
seems to be no different from working witha
. The variablec = b[["A", "test"]]
is clearly a view to the data ofb
. So why would they say that views on the arrayb
are not supported? Do I have to treatc
with extra caution?
推荐答案
In [161]: a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','
...: float']})
...: for i in range(len(a)):
...: a[i] = i
...:
In [162]: a
Out[162]:
array([(0, 0, 0.), (1, 1, 1.), (2, 2, 2.)],
dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<f8')])
定义新的dtype:
In [164]: a.dtype.descr
Out[164]: [('A', '<i8'), ('B', '<i8'), ('C', '<f8')]
In [165]: a.dtype.descr+[('test','O')]
Out[165]: [('A', '<i8'), ('B', '<i8'), ('C', '<f8'), ('test', 'O')]
In [166]: dt= a.dtype.descr+[('test','O')]
具有正确大小和dtype的新数组:
new array of right size and dtype:
In [167]: b = np.empty(a.shape, dt)
通过字段名称将值从a
复制到b
:
copy values from a
to b
by field name:
In [168]: for name in a.dtype.names:
...: b[name] = a[name]
...:
In [169]: b
Out[169]:
array([(0, 0, 0., None), (1, 1, 1., None), (2, 2, 2., None)],
dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<f8'), ('test', 'O')])
许多rf
函数通过字段复制来执行此字段:
Many of the rf
functions do this field by field copy:
rf.recursive_fill_fields(a,b)
rf.append_fields
在初始化它的output
数组后使用它.
rf.append_fields
uses this after it initializes it's output
array.
在较早的版本中,多字段索引生成了一个副本,因此b[list(a.dtype.names)] = a
之类的表达式将不起作用.
In earlier versions a multifield index produced a copy, so expressions like b[list(a.dtype.names)] = a
would not work.
我不知道是否有必要弄清楚rf.append_fields
在做什么.这些功能有些过时,并且没有被大量使用(请注意特殊的导入).因此,它们很可能存在无法正常工作的错误或边缘情况.正如我所演示的,我已经检查过的函数的功能-创建一个新的dtype和结果数组,并按字段名称复制数据.
I don't know if it's worth trying to figure out what rf.append_fields
is doing. Those functions are somewhat old, and not heavily used (note the special import). So it's entirely likely that they have bugs, or edge cases , that don't work. The functions that I've examined function much as I demonstrated - make a new dtype, and result array, and copy data by field name.
在最新版本中,访问多个字段的方式已发生变化. recfunctions
中有一些新功能可以简化结构化数组的使用,例如repack_fields
.
In recent releases there have been changes in how multiple fields are accessed. There are some new functions in recfunctions
to facilitate working with structured arrays, such as repack_fields
.
https://docs.scipy .org/doc/numpy/user/basics.rec.html#accessing-multiple-fields
我不知道这是否适用于append_fields
问题.我看到还有关于带有对象的结构化数组的部分,但是我还没有对此进行研究:
I don't know if any of that applies to the append_fields
problem. I see there's also a section about structured arrays with objects, but I haven't studied that:
https://docs.scipy.org/doc/numpy/user/basics.rec.html#viewing-structured-arrays-包含对象
为了防止破坏numpy.object类型的字段中的对象指针,numpy当前不允许查看包含对象的结构化数组.
In order to prevent clobbering object pointers in fields of numpy.object type, numpy currently does not allow views of structured arrays containing objects.
此行显然是指使用view
方法.通过字段索引创建的视图(无论是单名称列表还是多字段列表)都不会受到影响.
This line apparently refers to the use of view
method. Views created by field indexing, whether single name or multifield lists, are not affected.
append_fields
中的错误来自此操作:
The error in append_fields
comes from this operation:
In [183]: data = np.array([None,None,None])
In [184]: data
Out[184]: array([None, None, None], dtype=object)
In [185]: data.view([('test',object)])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-185-c46c4464b53c> in <module>
----> 1 data.view([('test',object)])
/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
使用对象dtypes创建复合dtype没问题:
There's no problem creating an compound dtype with object dtypes:
In [186]: np.array([None,None,None], dtype=[('test',object)])
Out[186]: array([(None,), (None,), (None,)], dtype=[('test', 'O')])
但是我看不到任何能够连接a
和data
的recfunctions
.
But I don't see any recfunctions
that are capable of joining a
and data
.
view
可用于更改a
的字段名称:
view
can be used to change the field names of a
:
In [219]: a.view([('AA',int),('BB',int),('cc',float)])
Out[219]:
array([(0, 0, 0.), (1, 1, 1.), (2, 2, 2.)],
dtype=[('AA', '<i8'), ('BB', '<i8'), ('cc', '<f8')])
,但出于相同的原因尝试对b
这样做失败:
but trying to do so for b
fails for the same reason:
In [220]: b.view([('AA',int),('BB',int),('cc',float),('d',object)])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-220-ab0a6e4dd57f> in <module>
----> 1 b.view([('AA',int),('BB',int),('cc',float),('d',object)])
/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
我从对象dtype数组开始,然后尝试使用i8
(相同大小的dtype)到view
,我得到了同样的错误.因此,对对象dtype的view
的限制不仅限于结构化数组.在指向i8
的对象指针的情况下,需要这样的限制是有道理的.在将对象指针嵌入到复合dtype中的情况下,对此类限制的需求可能不会那么令人信服.它甚至可能是矫kill过正,或者只是简单地安全而简单地使用它的情况.
I start with a object dtype array, and try to view
with i8
(same size dtype), I get this same error. So the restriction on view
of a object dtype isn't limited to structured arrays. The need for such a restriction in the case of object pointer to i8
makes sense. The need for such a restriction in the case of embedding the object pointer in a compound dtype might not be so compelling. It might even be overkill, or just a case of simply playing it safe and simple.
In [267]: x.dtype
Out[267]: dtype('O')
In [268]: x.shape
Out[268]: (3,)
In [269]: x.dtype.itemsize
Out[269]: 8
In [270]: x.view('i8')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-270-30c78b13cd10> in <module>
----> 1 x.view('i8')
/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
492
493 if newtype.hasobject or oldtype.hasobject:
--> 494 raise TypeError("Cannot change data-type for object array.")
495 return
496
TypeError: Cannot change data-type for object array.
请注意,第493行中的测试会同时检查新dtype和旧dtype的hasobject
属性.更细微的测试可能会同时检查两个hasobject
,但我怀疑逻辑可能会变得非常复杂.有时候,简单的禁令在一组复杂的测试中更安全(也更容易).
Note that the test in line 493 checks the hasobject
property of both the new and old dtypes. A more nuanced test might check if both hasobject
, but I suspect the logic could get quite complex. Sometimes a simple prohibition is safer (and easier) a complex set of tests.
进一步的测试
In [283]: rf.structured_to_unstructured(a)
Out[283]:
array([[ 3., 3., 0.],
[12., 10., 1.],
[ 2., 2., 2.]])
,但是尝试对b
甚至其字段的子集执行相同操作会产生常见的错误:
but trying to do the same on b
, or even a subset of its fields produces the familiar error:
rf.structured_to_unstructured(b)
rf.structured_to_unstructured(b[['A','B','C']])
我必须首先使用repack
制作无对象副本:
I have to first use repack
to make a object-less copy:
rf.structured_to_unstructured(rf.repack_fields(b[['A','B','C']]))
这篇关于添加和访问numpy结构化数组的对象类型字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!