ndarray转换为Structural_array并浮动为int [英] ndarray to structured_array and float to int

查看:129
本文介绍了ndarray转换为Structural_array并浮动为int的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到的问题是,使用ndarray.view(np.dtype)从经典ndarray获取结构化数组似乎会错误地将float转换为int.

The problem I encounter is that, by using ndarray.view(np.dtype) to get a structured array from a classic ndarray seems to miscompute the float to int conversion.

例子讲得更好:

In [12]: B
Out[12]: 
array([[  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
      0.00000000e+00,   4.43600000e+01,   0.00000000e+00],
   [  1.00000000e+00,   2.00000000e+00,   7.10000000e+00,
      1.10000000e+00,   4.43600000e+01,   1.32110000e+02],
   [  1.00000000e+00,   3.00000000e+00,   9.70000000e+00,
      2.10000000e+00,   4.43600000e+01,   2.04660000e+02],
   ..., 
   [  1.28900000e+03,   1.28700000e+03,   0.00000000e+00,
      9.99999000e+05,   4.75600000e+01,   3.55374000e+03],
   [  1.28900000e+03,   1.28800000e+03,   1.29000000e+01,
      5.40000000e+00,   4.19200000e+01,   2.08400000e+02],
   [  1.28900000e+03,   1.28900000e+03,   0.00000000e+00,
      0.00000000e+00,   4.19200000e+01,   0.00000000e+00]])

In [14]: B.view(A.dtype)
Out[14]: 
array([(4607182418800017408, 4607182418800017408, 0.0, 0.0, 44.36, 0.0),
   (4607182418800017408, 4611686018427387904, 7.1, 1.1, 44.36, 132.11),
   (4607182418800017408, 4613937818241073152, 9.7, 2.1, 44.36, 204.66),
   ...,
   (4653383897399164928, 4653375101306142720, 0.0, 999999.0, 47.56, 3553.74),
   (4653383897399164928, 4653379499352653824, 12.9, 5.4, 41.92, 208.4),
   (4653383897399164928, 4653383897399164928, 0.0, 0.0, 41.92, 0.0)], 
  dtype=[('i', '<i8'), ('j', '<i8'), ('tnvtc', '<f8'), ('tvtc', '<f8'), ('tf', '<f8'), ('tvps', '<f8')])

'i'和'j'列是真实的整数:

The 'i' and 'j' columns are true integers:

在这里,我还有两项检查,问题似乎来自ndarray.view(np.int)

Here you have two further check I have done, the problem seems to come from the ndarray.view(np.int)

In [21]: B[:,:2]
Out[21]: 
array([[  1.00000000e+00,   1.00000000e+00],
   [  1.00000000e+00,   2.00000000e+00],
   [  1.00000000e+00,   3.00000000e+00],
   ..., 
   [  1.28900000e+03,   1.28700000e+03],
   [  1.28900000e+03,   1.28800000e+03],
   [  1.28900000e+03,   1.28900000e+03]])

In [22]: B[:,:2].view(np.int)
Out[22]: 
array([[4607182418800017408, 4607182418800017408],
   [4607182418800017408, 4611686018427387904],
   [4607182418800017408, 4613937818241073152],
   ..., 
   [4653383897399164928, 4653375101306142720],
   [4653383897399164928, 4653379499352653824],
   [4653383897399164928, 4653383897399164928]])

In [23]: B[:,:2].astype(np.int)
Out[23]: 
array([[   1,    1],
   [   1,    2],
   [   1,    3],
   ..., 
   [1289, 1287],
   [1289, 1288],
   [1289, 1289]])

我做错了什么?由于numpy分配内存,我不能更改类型吗?还有另一种方法可以做到这一点(fromarrays指责shape mismatch吗?

What am I doing wrong? Can't I change the type due to numpy allocation memory? Is there another way to do this (fromarrays, was accusing a shape mismatch ?

推荐答案

这是执行somearray.view(new_dtype)和调用astype之间的区别.

This is the difference between doing somearray.view(new_dtype) and calling astype.

您所看到的正是预期的行为,这是非常故意的,但这是您第一次碰到它时的起义.

What you're seeing is exactly the expected behavior, and it's very deliberate, but it's uprising the first time you come across it.

具有不同dtype的视图会将数组的底层内存缓冲区解释为给定的dtype.没有副本.它非常强大,但是您必须了解自己在做什么.

A view with a different dtype interprets the underlying memory buffer of the array as the given dtype. No copies are made. It's very powerful, but you have to understand what you're doing.

要记住的关键一件事是,调用view永远不会更改底层的内存缓冲区,就像numpy对其进行查看的方式一样(例如dtype,shape,stride).因此,view 故意避免将数据更改为新类型,而只是将旧位"解释为新dtype.

A key thing to remember is that calling view never alters the underlying memory buffer, just the way that it's viewed by numpy (e.g. dtype, shape, strides). Therefore, view deliberately avoids altering the data to the new type and instead just interprets the "old bits" as the new dtype.

例如:

In [1]: import numpy as np

In [2]: x = np.arange(10)

In [3]: x
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]: x.dtype
Out[4]: dtype('int64')

In [5]: x.view(np.int32)
Out[5]: array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0],
              dtype=int32)

In [6]: x.view(np.float64)
Out[6]:
array([  0.00000000e+000,   4.94065646e-324,   9.88131292e-324,
         1.48219694e-323,   1.97626258e-323,   2.47032823e-323,
         2.96439388e-323,   3.45845952e-323,   3.95252517e-323,
         4.44659081e-323])

如果要使用新的dtype复制数组,请使用astype代替:

If you want to make a copy of the array with a new dtype, use astype instead:

In [7]: x
Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]: x.astype(np.int32)
Out[8]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [9]: x.astype(float)
Out[9]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])


但是,将astype与结构化数组一起使用可能会使您感到惊讶.结构化数组将输入的每个元素视为类似C的结构.因此,如果您呼叫astype,您将遇到许多惊喜.


However, using astype with structured arrays will probably surprise you. Structured arrays treat each element of the input as a C-like struct. Therefore, if you call astype, you'll run into several suprises.

基本上,您希望这些列具有不同的dtype.在这种情况下,请勿将它们放在同一阵列中.块状阵列有望是同质的.在某些情况下,结构化数组很方便,但是如果您正在寻找可以处理单独的数据列的东西,那么结构化数组可能就不是您想要的.只需将每列用作其自己的数组即可.

Basically, you want the columns to have a different dtype. In that case, don't put them in the same array. Numpy arrays are expected to be homogenous. Structured arrays are handy in certain cases, but they're probably not what you want if you're looking for something to handle separate columns of data. Just use each column as its own array.

更好的是,如果您要处理表格数据,则可能会发现它比直接使用numpy数组更容易使用pandas. pandas面向表格数据(其中的列应具有不同的类型),而numpy面向同质数组.

Better yet, if you're working with tabular data, you'll probably find its easier to use pandas than to use numpy arrays directly. pandas is oriented towards tabular data (where columns are expected to have different types), while numpy is oriented towards homogenous arrays.

这篇关于ndarray转换为Structural_array并浮动为int的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆