添加和访问numpy结构化数组的对象类型字段 [英] Add and access object-type field of a numpy structured array

查看:127
本文介绍了添加和访问numpy结构化数组的对象类型字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用numpy 1.16.2.

I am using numpy 1.16.2.

总之,我想知道如何向结构化数组添加对象类型字段.通过recfunctions模块的标准方式会引发错误,我想这是有原因的.因此,我想知道我的解决方法是否有问题.此外,我想了解为什么需要这种解决方法,以及在访问新创建的数组时是否需要格外小心.

In brief, I am wondering how to add an object-type field to a structured array. The standard way via the recfunctions module throws an error and I suppose there is a reason for this. Therefore, I wonder whether there is anything wrong with my workaround. Furthermore, I would like to understand why this workaround is necessary and whether I need to use extra caution when accessing the newly created array.

现在是详细信息:

我有一个numpy结构化数组:

I have a numpy structured array:

import numpy as np
a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','float']})
for i in range(len(a)):
    a[i] = i

我想将类型为object的另一个字段"test"添加到数组a中.执行此操作的标准方法是使用numpy的recfunctions模块:

I want to add another field "test" of type object to the array a. The standard way for doing this is using numpy's recfunctions module:

import numpy.lib.recfunctions as rf
b = rf.append_fields(a, "test", [None]*len(a)) 

此代码引发错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-4a7be4f94686> in <module>
----> 1 rf.append_fields(a, "test", [None]*len(a))

D:\_Programme\Anaconda3\lib\site-packages\numpy\lib\recfunctions.py in append_fields(base, names, data, dtypes, fill_value, usemask, asrecarray)
    718     if dtypes is None:
    719         data = [np.array(a, copy=False, subok=True) for a in data]
--> 720         data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
    721     else:
    722         if not isinstance(dtypes, (tuple, list)):

D:\_Programme\Anaconda3\lib\site-packages\numpy\lib\recfunctions.py in <listcomp>(.0)
    718     if dtypes is None:
    719         data = [np.array(a, copy=False, subok=True) for a in data]
--> 720         data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
    721     else:
    722         if not isinstance(dtypes, (tuple, list)):

D:\_Programme\Anaconda3\lib\site-packages\numpy\core\_internal.py in _view_is_safe(oldtype, newtype)
    492 
    493     if newtype.hasobject or oldtype.hasobject:
--> 494         raise TypeError("Cannot change data-type for object array.")
    495     return
    496 

TypeError: Cannot change data-type for object array.

虽然问题已经很久了,但我已经在此处讨论了类似的错误.不知道我观察到的行为是否实际上是一个错误. 此处得知不支持包含通用对象的结构化数组的视图.

A similar error has been discussed here, though the issue is old and I do not know whether the behaviour I am observing is actually a bug. Here I am informed that views of structured arrays containing general objects are not supported.

因此,我建立了一种解决方法:

I therefore built a workaround:

b = np.empty(len(a), dtype=a.dtype.descr+[("test", object)])
b[list(a.dtype.names)] = a

这有效.但是,我有以下问题:

This works. Nonetheless, I have the following questions:

问题

  • 为什么需要这种解决方法?这只是一个错误吗?
  • 使用新数组b与使用a似乎没有什么不同.变量c = b[["A", "test"]]显然是b数据的视图.那么,为什么他们会说不支持数组b上的视图?我必须格外小心对待c吗?
  • Why is this workaround neccesary? Is this just a bug?
  • Working with the new array b seems to be no different from working with a. The variable c = b[["A", "test"]] is clearly a view to the data of b. So why would they say that views on the array b are not supported? Do I have to treat c with extra caution?

推荐答案

In [161]: a = np.zeros(3, dtype={'names':['A','B','C'], 'formats':['int','int','
     ...: float']}) 
     ...: for i in range(len(a)): 
     ...:     a[i] = i 
     ...:                                                                       
In [162]: a                                                                     
Out[162]: 
array([(0, 0, 0.), (1, 1, 1.), (2, 2, 2.)],
      dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<f8')])

定义新的dtype:

In [164]: a.dtype.descr                                                         
Out[164]: [('A', '<i8'), ('B', '<i8'), ('C', '<f8')]
In [165]: a.dtype.descr+[('test','O')]                                          
Out[165]: [('A', '<i8'), ('B', '<i8'), ('C', '<f8'), ('test', 'O')]
In [166]: dt= a.dtype.descr+[('test','O')]                                      

具有正确大小和dtype的新数组:

new array of right size and dtype:

In [167]: b = np.empty(a.shape, dt)                                             

通过字段名称将值从a复制到b:

copy values from a to b by field name:

In [168]: for name in a.dtype.names: 
     ...:     b[name] = a[name] 
     ...:                                                                       
In [169]: b                                                                     
Out[169]: 
array([(0, 0, 0., None), (1, 1, 1., None), (2, 2, 2., None)],
      dtype=[('A', '<i8'), ('B', '<i8'), ('C', '<f8'), ('test', 'O')])

许多rf函数通过字段复制来执行此字段:

Many of the rf functions do this field by field copy:

rf.recursive_fill_fields(a,b)

rf.append_fields在初始化它的output数组后使用它.

rf.append_fields uses this after it initializes it's output array.

在较早的版本中,多字段索引生成了一个副本,因此b[list(a.dtype.names)] = a之类的表达式将不起作用.

In earlier versions a multifield index produced a copy, so expressions like b[list(a.dtype.names)] = a would not work.

我不知道是否有必要弄清楚rf.append_fields在做什么.这些功能有些过时,并且没有被大量使用(请注意特殊的导入).因此,它们很可能存在无法正常工作的错误或边缘情况.正如我所演示的,我已经检查过的函数的功能-创建一个新的dtype和结果数组,并按字段名称复制数据.

I don't know if it's worth trying to figure out what rf.append_fields is doing. Those functions are somewhat old, and not heavily used (note the special import). So it's entirely likely that they have bugs, or edge cases , that don't work. The functions that I've examined function much as I demonstrated - make a new dtype, and result array, and copy data by field name.

在最新版本中,访问多个字段的方式已发生变化. recfunctions中有一些新功能可以简化结构化数组的使用,例如repack_fields.

In recent releases there have been changes in how multiple fields are accessed. There are some new functions in recfunctions to facilitate working with structured arrays, such as repack_fields.

https://docs.scipy .org/doc/numpy/user/basics.rec.html#accessing-multiple-fields

我不知道这是否适用于append_fields问题.我看到还有关于带有对象的结构化数组的部分,但是我还没有对此进行研究:

I don't know if any of that applies to the append_fields problem. I see there's also a section about structured arrays with objects, but I haven't studied that:

https://docs.scipy.org/doc/numpy/user/basics.rec.html#viewing-structured-arrays-包含对象

为了防止破坏numpy.object类型的字段中的对象指针,numpy当前不允许查看包含对象的结构化数组.

In order to prevent clobbering object pointers in fields of numpy.object type, numpy currently does not allow views of structured arrays containing objects.

此行显然是指使用view方法.通过字段索引创建的视图(无论是单名称列表还是多字段列表)都不会受到影响.

This line apparently refers to the use of view method. Views created by field indexing, whether single name or multifield lists, are not affected.

append_fields中的错误来自此操作:

The error in append_fields comes from this operation:

In [183]: data = np.array([None,None,None])                                          
In [184]: data                                                                       
Out[184]: array([None, None, None], dtype=object)
In [185]: data.view([('test',object)])                                               
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-185-c46c4464b53c> in <module>
----> 1 data.view([('test',object)])

/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
    492 
    493     if newtype.hasobject or oldtype.hasobject:
--> 494         raise TypeError("Cannot change data-type for object array.")
    495     return
    496 

TypeError: Cannot change data-type for object array.

使用对象dtypes创建复合dtype没问题:

There's no problem creating an compound dtype with object dtypes:

In [186]: np.array([None,None,None], dtype=[('test',object)])                        
Out[186]: array([(None,), (None,), (None,)], dtype=[('test', 'O')])

但是我看不到任何能够连接adatarecfunctions.

But I don't see any recfunctions that are capable of joining a and data.

view可用于更改a的字段名称:

view can be used to change the field names of a:

In [219]: a.view([('AA',int),('BB',int),('cc',float)])                               
Out[219]: 
array([(0, 0, 0.), (1, 1, 1.), (2, 2, 2.)],
      dtype=[('AA', '<i8'), ('BB', '<i8'), ('cc', '<f8')])

,但出于相同的原因尝试对b这样做失败:

but trying to do so for b fails for the same reason:

In [220]: b.view([('AA',int),('BB',int),('cc',float),('d',object)])                  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-220-ab0a6e4dd57f> in <module>
----> 1 b.view([('AA',int),('BB',int),('cc',float),('d',object)])

/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
    492 
    493     if newtype.hasobject or oldtype.hasobject:
--> 494         raise TypeError("Cannot change data-type for object array.")
    495     return
    496 

TypeError: Cannot change data-type for object array.


我从对象dtype数组开始,然后尝试使用i8(相同大小的dtype)到view,我得到了同样的错误.因此,对对象dtype的view的限制不仅限于结构化数组.在指向i8的对象指针的情况下,需要这样的限制是有道理的.在将对象指针嵌入到复合dtype中的情况下,对此类限制的需求可能不会那么令人信服.它甚至可能是矫kill过正,或者只是简单地安全而简单地使用它的情况.


I start with a object dtype array, and try to view with i8 (same size dtype), I get this same error. So the restriction on view of a object dtype isn't limited to structured arrays. The need for such a restriction in the case of object pointer to i8 makes sense. The need for such a restriction in the case of embedding the object pointer in a compound dtype might not be so compelling. It might even be overkill, or just a case of simply playing it safe and simple.

In [267]: x.dtype                                                                    
Out[267]: dtype('O')
In [268]: x.shape                                                                    
Out[268]: (3,)
In [269]: x.dtype.itemsize                                                           
Out[269]: 8
In [270]: x.view('i8')                                                               
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-270-30c78b13cd10> in <module>
----> 1 x.view('i8')

/usr/local/lib/python3.6/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
    492 
    493     if newtype.hasobject or oldtype.hasobject:
--> 494         raise TypeError("Cannot change data-type for object array.")
    495     return
    496 

TypeError: Cannot change data-type for object array.

请注意,第493行中的测试会同时检查新dtype和旧dtype的hasobject属性.更细微的测试可能会同时检查两个hasobject,但我怀疑逻辑可能会变得非常复杂.有时候,简单的禁令在一组复杂的测试中更安全(也更容易).

Note that the test in line 493 checks the hasobject property of both the new and old dtypes. A more nuanced test might check if both hasobject, but I suspect the logic could get quite complex. Sometimes a simple prohibition is safer (and easier) a complex set of tests.

进一步的测试

In [283]: rf.structured_to_unstructured(a)                                           
Out[283]: 
array([[ 3.,  3.,  0.],
       [12., 10.,  1.],
       [ 2.,  2.,  2.]])

,但是尝试对b甚至其字段的子集执行相同操作会产生常见的错误:

but trying to do the same on b, or even a subset of its fields produces the familiar error:

rf.structured_to_unstructured(b)
rf.structured_to_unstructured(b[['A','B','C']]) 

我必须首先使用repack制作无对象副本:

I have to first use repack to make a object-less copy:

rf.structured_to_unstructured(rf.repack_fields(b[['A','B','C']])) 

这篇关于添加和访问numpy结构化数组的对象类型字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆