创建结构化数组的方法 [英] Methods of creating a structured array

查看:116
本文介绍了创建结构化数组的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下信息,我可以生成所需结构的numpy数组。请注意,x和y值必须分别确定,因为它们的范围可能不同,所以我不能使用:

I have the following information and I can produce a numpy array of the desired structure. Note that the values x and y have to be determined separately since their ranges may differ so I cannot use:

xy = np.random.random_integers(0,10,size=(N,2))

额外的 list [... 转换对于该转换在Python 3.4中起作用是必需的,但不是必需的,但在使用Python 2.7时无害。

The extra list[... conversion is necessary for the conversion in order for it to work in Python 3.4, it is not necessary, but not harmful when using Python 2.7.

以下作品:

>>> # attempts to formulate [id,(x,y)] with specified dtype 
>>> N = 10
>>> x = np.random.random_integers(0,10,size=N)
>>> y = np.random.random_integers(0,10,size=N)
>>> id = np.arange(N)
>>> dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
>>> arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
>>> arr
    array([(0, [7.0, 7.0]), (1, [7.0, 7.0]), (2, [5.0, 5.0]), (3, [0.0, 0.0]),
           (4, [6.0, 6.0]), (5, [6.0, 6.0]), (6, [7.0, 7.0]),
           (7, [10.0, 10.0]), (8, [3.0, 3.0]), (9, [7.0, 7.0])], 
          dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

我很聪明地认为我可以通过简单地在数组中创建数组来绕过上述讨厌的地方所需的垂直结构,并对其应用我的dtype,希望它能起作用。堆叠的数组在垂直形式中是正确的

I cleverly thought I could circumvent the above nasty bits by simply creating the array in the desired vertical structure and applying my dtype to it, hoping that it would work. The stacked array is correct in the vertical form

>>> a = np.vstack((id,x,y)).T
>>> a
array([[ 0,  7,  6],
       [ 1,  7,  7],
       [ 2,  5,  9],
       [ 3,  0,  1],    
       [ 4,  6,  1],
       [ 5,  6,  6],
       [ 6,  7,  6],
       [ 7, 10,  9],
       [ 8,  3,  2],
       [ 9,  7,  8]])

我尝试了几种方法来尝试重新格式化上面的数组,以便我的dtype可以工作并且我无法弄清楚(这包括vstacking vstack等)。所以我的问题是...如何使用vstack版本并将其转换为符合dtype要求的格式,而不必执行我执行的过程。我希望它很明显,但是我将其切成薄片,堆积起来并使其椭圆形成一个无休止的循环。

I tried several ways of trying to reformulate the above array so that my dtype would work and I just can't figure it out (this included vstacking a vstack etc). So my question is...how can I use the vstack version and get it into a format that meets my dtype requirements without having to go through the procedure that I did. I am hoping it is obvious, but I am sliced, stacked and ellipsed myself into an endless loop.

摘要

非常感谢hpaulj。根据他的建议,我已经包括了两个化身,供他人考虑。纯粹的numpy解决方案速度更快且更清洁。

Many thanks to hpaulj. I have included two incarnations based upon his suggestions for others to consider. The pure numpy solution is substantially faster and a lot cleaner.

"""
Script:  pnts_StackExch
Author:  Dan.Patterson@carleton.ca
Modified: 2015-08-24
Purpose: 
    To provide some timing options on point creation in preparation for
    point-to-point distance calculations using einsum.
Reference:
    http://stackoverflow.com/questions/32224220/
    methods-of-creating-a-structured-array
Functions:
    decorators:  profile_func, timing, arg_deco
    main:  make_pnts, einsum_0
"""
import numpy as np
import random
import time
from functools import wraps

np.set_printoptions(edgeitems=5,linewidth=75,precision=2,suppress=True,threshold=5)

# .... wrapper funcs .............
def delta_time(func):
    """timing decorator function"""
    import time
    @wraps(func)
    def wrapper(*args, **kwargs):
        print("\nTiming function for... {}".format(func.__name__))
        t0 = time.time()                # start time
        result = func(*args, **kwargs)  # ... run the function ...
        t1 = time.time()                # end time
        print("Results for... {}".format(func.__name__))
        print("  time taken ...{:12.9f} sec.".format(t1-t0))
        #print("\n  print results inside wrapper or use <return> ... ")
        return result                   # return the result of the function
    return wrapper

def arg_deco(func):
    """This wrapper just prints some basic function information."""
    @wraps(func)
    def wrapper(*args,**kwargs):
        print("Function... {}".format(func.__name__))
        #print("File....... {}".format(func.__code__.co_filename))
        print("  args.... {}\n  kwargs. {}".format(args,kwargs))
        #print("  docs.... {}\n".format(func.__doc__))
        return func(*args, **kwargs)
    return wrapper

# .... main funcs ................
@delta_time
@arg_deco
def pnts_IdShape(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
    """Make N points based upon a random normal distribution,
       with optional min/max values for Xs and Ys
    """
    dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))]) 
    IDs = np.arange(0,N)
    Xs = np.random.random_integers(x_min,x_max,size=N) # note below
    Ys = np.random.random_integers(y_min,y_max,size=N)
    a = np.array([(i,j) for i,j in zip(IDs,np.column_stack((Xs,Ys)))],dt)
    return IDs,Xs,Ys,a

@delta_time
@arg_deco
def alternate(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
    """ after hpaulj and his mods to the above and this.  See docs
    """
    dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
    IDs = np.arange(0,N)
    Xs = np.random.random_integers(0,10,size=N)
    Ys = np.random.random_integers(0,10,size=N)   
    c_stack = np.column_stack((IDs,Xs,Ys))
    a = np.ones(N, dtype=dt)
    a['ID'] = c_stack[:,0]
    a['Shape'] = c_stack[:,1:]
    return IDs,Xs,Ys,a

if __name__=="__main__":
    """time testing for various methods
    """
    id_1,xs_1,ys_1,a_1 = pnts_IdShape(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
    id_2,xs_2,ys_2,a_2 = alternate(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10) 

1,000,000分的计时结果如下

Timing results for 1,000,000 points are as follows

Timing function for... pnts_IdShape
Function... **pnts_IdShape**
  args.... ()
  kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... pnts_IdShape
  time taken ... **0.680652857 sec**.

Timing function for... **alternate**
Function... alternate
  args.... ()
  kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... alternate
  time taken ... **0.060056925 sec**.


推荐答案

有两种填充结构化数组的方法(< a href = http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays rel = nofollow> http://docs.scipy.org/doc/ numpy / user / basics.rec.html#filling-structured-arrays )-按行(或包含元组列表的行)和按字段。

There are 2 ways of filling a structured array (http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) - by row (or rows with list of tuples), and by field.

要按字段执行此操作,请创建一个空的结构化数组,然后按字段名称分配值

To do this by field, create the empty structured array, and assign values by field name

In [19]: a=np.column_stack((id,x,y))
# same as your vstack().T

In [20]: Y=np.zeros(a.shape[0], dtype=dt)
# empty, ones, etc
In [21]: Y['ID'] = a[:,0]
In [22]: Y['Shape'] = a[:,1:]
# (2,) field takes a 2 column array
In [23]: Y
Out[23]: 
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
       (4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
       (8, [6.0, 1.0]), (9, [6.0, 6.0])], 
      dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

在表面上

arr = np.array(list(zip(id,np.hstack((x,y)))),dt)

看起来是构建元组列表的一种好方法数组。但是结果重复了 x 的值,而不是使用 y 。我必须看看哪里出了问题。

looks like an ok way of constructing the list of tuples need to fill the array. But result duplicates the values of x instead of using y. I'll have to look at what is wrong.

如果以下情况,您可以查看 a 这样的数组: dtype 是兼容的-3个int列的数据缓冲区的布局与具有3个int字段的数据缓冲区的布局相同。

You can take a view of an array like a if the dtype is compatible - the data buffer for 3 int columns is layed out the same way as one with 3 int fields.

a.view('i4,i4,i4')

但是您的dtype需要'i4,f8,f8',混合使用4和8个字节的字段,以及混合使用int和float。 a 缓冲区必须进行转换才能实现。 视图无法做到。 (甚至不用问.astype。)

But your dtype wants 'i4,f8,f8', a mix of 4 and 8 byte fields, and a mix of int and float. The a buffer will have to be transformed to achieve that. view can't do it. (don't even ask about .astype.)

更正的元组方法列表:

In [35]: np.array([(i,j) for i,j in zip(id,np.column_stack((x,y)))],dt)
Out[35]: 
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
       (4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
       (8, [6.0, 1.0]), (9, [6.0, 6.0])], 
      dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])

列表理解产生类似这样的列表:

The list comprehension produces a list like:

[(0, array([8, 8])),
 (1, array([8, 0])),
 (2, array([6, 2])),
 ....]

对于列表中的每个元组, [0] 进入dtype的第一个字段,而 [1] (一个小数组),排在第二位。

For each tuple in the list, the [0] goes in the first field of the dtype, and [1] (a small array), goes in the 2nd.

元组也可以用

[(i,[j,k]) for i,j,k in zip(id,x,y)]







dt1 = np.dtype([('ID','<i4'),('Shape',('<i4',(2,)))])

是与视图兼容的dtype (仍为3个整数)

is a view compatible dtype (still 3 integers)

In [42]: a.view(dtype=dt1)
Out[42]: 
array([[(0, [8, 8])],
       [(1, [8, 0])],
       [(2, [6, 2])],
       [(3, [8, 8])],
       [(4, [3, 2])],
       [(5, [6, 1])],
       [(6, [5, 6])],
       [(7, [7, 7])],
       [(8, [6, 1])],
       [(9, [6, 6])]], 
      dtype=[('ID', '<i4'), ('Shape', '<i4', (2,))])

这篇关于创建结构化数组的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆