numpy自定义dtype挑战 [英] numpy custom dtype challenge

查看:188
本文介绍了numpy自定义dtype挑战的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个自定义dtype my_type 数组,可以从二进制文件中成功读取该数组.自定义dtype在数据之后有一个标头部分.数据部分是 np.int16 数字,因此自定义dtype如下所示:

I have an array of custom dtype my_type which I successfully read from a binary file. The custom dtype has a header section after that comes the data. The data part are np.int16 numbers, so the custom dtype looks like this:

header, imaginary, real, imaginary, real,  ..., imaginary, real 

现在,我正在寻找一种使用Numpy的视图的聪明方法,以获取仅数据的 np.complex64 数组,而无需复制/循环等.请考虑以下事实:

Now I am looking for a smart way to use Numpy's view to get an array of np.complex64 of only data without copying/looping etc. considering the following facts:

  • 标题部分应忽略
  • 以某种方式纠正顺序(即第一个实数,虚数)
  • 结果数组应该是complex64而不是complex32!

也就是说,通过自定义dtype数组:

That is, from an array of custom dtype:

[my_type, my_type, ..., my_type] 

我想得到一个更大的数组,其中包含:

I like to get a much larger array containing:

[complex64, complex64, ..., complex64]

是否可以使用Numpy的视图一口气做到这一点?

Is it possible to do this in one go using Numpy's view?

更新:

因此解决方案是在内存中复制.非常感谢以下答案.但是因为令人讨厌的标题出现在每个数据帧之前,所以似乎尽管在内存中进行了复制,但仍然有必要遍历所有数据帧.以示意图的方式,我有:

So the solution is copying in memory. Many thanks to the answers below. But because the annoying header appears before every data frame, it seems that in spite of copying in the memory, a loop over all data frames is still necessary. In a schematic manner I have:

a = np.arange(10, dtype=np.float16)
skip_annoying_header = 2
r = np.zeros(a.size - skip_annoying_header, np.float16)
r[0::2], r[1::2] = a[skip_annoying_header + 1::2], a[skip_annoying_header::2]
r = r.astype(np.float32)
r = r.view(np.complex64)

然后在每个数据帧的for循环中执行此操作,然后在for循环结束时,再次将r的内容复制到 big 数组中.

And I do this in a for loop for every data frame, and then at the end of the for loop, I copy again the content of r into big array.

可以以某种方式消除这种循环吗?

Can this looping be somehow eliminated?

推荐答案

所有3个要求都与view冲突.

All 3 requirements conflict with a view.

忽略header字段需要选择其他字段.选择单个场显然是一个视图,但是多个场的状态一直在变化.当我尝试除查看值之外的任何方法时,都会收到警告:

Ignoring the header field requires selecting the other fields. Selecting a single field is clearly a view, but the state of multiple fields is in flux. When I try anything besides simply viewing the values I get a warning:

In [497]: dt=np.dtype('U10,f,f,f,f')
In [498]: x=np.zeros((5,),dt)

In [505]: x[['f1','f3']].__array_interface__
/usr/bin/ipython3:1: FutureWarning: Numpy has detected that you (may be) writing to an array returned
by numpy.diagonal or by selecting multiple fields in a record
array. This code will likely break in a future numpy release --
see numpy.diagonal or arrays.indexing reference docs for details.
The quick fix is to make an explicit copy (e.g., do
arr.diagonal().copy() or arr[['f0','f1']].copy()).

请记住,数据是逐元素进行布局的,而dtype元组值位于紧凑块中-本质上是显示的紧凑版本.忽略header要求跳过该字节集. view可以处理strides产生的跳过,但不能处理这些dtype字段跳过.

Remember, the data is layed out element by element, with the dtype tuple values in compact blocks - essentially a compact version of the display. Ignoring the header requires skipping that set of bytes. view can handle skips produced by strides, but not these dtype field skips.

In [533]: x
Out[533]: 
array([('header', 0.0, 5.0, 1.0, 10.0), ('header', 1.0, 4.0, 1.0, 10.0),
       ('header', 2.0, 3.0, 1.0, 10.0), ('header', 3.0, 2.0, 1.0, 10.0),
       ('header', 4.0, 1.0, 1.0, 10.0)], 
      dtype=[('f0', '<U10'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<f4')])

要探索对复杂字段进行重新排序,请尝试使用2d数组:

To explore reordering the complex fields, lets try a 2d array:

In [509]: y=np.arange(10.).reshape(5,2)  # 2 column float
In [510]: y.view(complex)    # can be viewed as complex
Out[510]: 
array([[ 0.+1.j],
       [ 2.+3.j],
       [ 4.+5.j],
       [ 6.+7.j],
       [ 8.+9.j]])
In [511]: y[:,::-1].view(complex)
...
ValueError: new type not compatible with array.

要切换实/想像列,我必须进行复制. complex要求2个浮点必须连续且有序.

To switch the real/imaginay columns I have to make a copy. complex requires that the 2 floats be contiguous and in order.

In [512]: y[:,::-1].copy().view(complex)
Out[512]: 
array([[ 1.+0.j],
       [ 3.+2.j],
       [ 5.+4.j],
       [ 7.+6.j],
       [ 9.+8.j]])

float32float64显然不是view更改.一个使用每个数字4个字节,另一个使用8个字节.如果没有复制,则不能将4视为8.

float32 to float64 is clearly not a view change. One uses 4 bytes per number, the other 8. You can't 'view' 4 as 8 without copying.

这篇关于numpy自定义dtype挑战的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆