创建结构化数组的方法 [英] Methods of creating a structured array
问题描述
我有以下信息,我可以生成所需结构的numpy数组。请注意,x和y值必须分别确定,因为它们的范围可能不同,所以我不能使用:
I have the following information and I can produce a numpy array of the desired structure. Note that the values x and y have to be determined separately since their ranges may differ so I cannot use:
xy = np.random.random_integers(0,10,size=(N,2))
额外的 list [... 转换对于该转换在Python 3.4中起作用是必需的,但不是必需的,但在使用Python 2.7时无害。
The extra list[... conversion is necessary for the conversion in order for it to work in Python 3.4, it is not necessary, but not harmful when using Python 2.7.
以下作品:
>>> # attempts to formulate [id,(x,y)] with specified dtype
>>> N = 10
>>> x = np.random.random_integers(0,10,size=N)
>>> y = np.random.random_integers(0,10,size=N)
>>> id = np.arange(N)
>>> dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
>>> arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
>>> arr
array([(0, [7.0, 7.0]), (1, [7.0, 7.0]), (2, [5.0, 5.0]), (3, [0.0, 0.0]),
(4, [6.0, 6.0]), (5, [6.0, 6.0]), (6, [7.0, 7.0]),
(7, [10.0, 10.0]), (8, [3.0, 3.0]), (9, [7.0, 7.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
我很聪明地认为我可以通过简单地在数组中创建数组来绕过上述讨厌的地方所需的垂直结构,并对其应用我的dtype,希望它能起作用。堆叠的数组在垂直形式中是正确的
I cleverly thought I could circumvent the above nasty bits by simply creating the array in the desired vertical structure and applying my dtype to it, hoping that it would work. The stacked array is correct in the vertical form
>>> a = np.vstack((id,x,y)).T
>>> a
array([[ 0, 7, 6],
[ 1, 7, 7],
[ 2, 5, 9],
[ 3, 0, 1],
[ 4, 6, 1],
[ 5, 6, 6],
[ 6, 7, 6],
[ 7, 10, 9],
[ 8, 3, 2],
[ 9, 7, 8]])
我尝试了几种方法来尝试重新格式化上面的数组,以便我的dtype可以工作并且我无法弄清楚(这包括vstacking vstack等)。所以我的问题是...如何使用vstack版本并将其转换为符合dtype要求的格式,而不必执行我执行的过程。我希望它很明显,但是我将其切成薄片,堆积起来并使其椭圆形成一个无休止的循环。
I tried several ways of trying to reformulate the above array so that my dtype would work and I just can't figure it out (this included vstacking a vstack etc). So my question is...how can I use the vstack version and get it into a format that meets my dtype requirements without having to go through the procedure that I did. I am hoping it is obvious, but I am sliced, stacked and ellipsed myself into an endless loop.
摘要
非常感谢hpaulj。根据他的建议,我已经包括了两个化身,供他人考虑。纯粹的numpy解决方案速度更快且更清洁。
Many thanks to hpaulj. I have included two incarnations based upon his suggestions for others to consider. The pure numpy solution is substantially faster and a lot cleaner.
"""
Script: pnts_StackExch
Author: Dan.Patterson@carleton.ca
Modified: 2015-08-24
Purpose:
To provide some timing options on point creation in preparation for
point-to-point distance calculations using einsum.
Reference:
http://stackoverflow.com/questions/32224220/
methods-of-creating-a-structured-array
Functions:
decorators: profile_func, timing, arg_deco
main: make_pnts, einsum_0
"""
import numpy as np
import random
import time
from functools import wraps
np.set_printoptions(edgeitems=5,linewidth=75,precision=2,suppress=True,threshold=5)
# .... wrapper funcs .............
def delta_time(func):
"""timing decorator function"""
import time
@wraps(func)
def wrapper(*args, **kwargs):
print("\nTiming function for... {}".format(func.__name__))
t0 = time.time() # start time
result = func(*args, **kwargs) # ... run the function ...
t1 = time.time() # end time
print("Results for... {}".format(func.__name__))
print(" time taken ...{:12.9f} sec.".format(t1-t0))
#print("\n print results inside wrapper or use <return> ... ")
return result # return the result of the function
return wrapper
def arg_deco(func):
"""This wrapper just prints some basic function information."""
@wraps(func)
def wrapper(*args,**kwargs):
print("Function... {}".format(func.__name__))
#print("File....... {}".format(func.__code__.co_filename))
print(" args.... {}\n kwargs. {}".format(args,kwargs))
#print(" docs.... {}\n".format(func.__doc__))
return func(*args, **kwargs)
return wrapper
# .... main funcs ................
@delta_time
@arg_deco
def pnts_IdShape(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
"""Make N points based upon a random normal distribution,
with optional min/max values for Xs and Ys
"""
dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
IDs = np.arange(0,N)
Xs = np.random.random_integers(x_min,x_max,size=N) # note below
Ys = np.random.random_integers(y_min,y_max,size=N)
a = np.array([(i,j) for i,j in zip(IDs,np.column_stack((Xs,Ys)))],dt)
return IDs,Xs,Ys,a
@delta_time
@arg_deco
def alternate(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
""" after hpaulj and his mods to the above and this. See docs
"""
dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
IDs = np.arange(0,N)
Xs = np.random.random_integers(0,10,size=N)
Ys = np.random.random_integers(0,10,size=N)
c_stack = np.column_stack((IDs,Xs,Ys))
a = np.ones(N, dtype=dt)
a['ID'] = c_stack[:,0]
a['Shape'] = c_stack[:,1:]
return IDs,Xs,Ys,a
if __name__=="__main__":
"""time testing for various methods
"""
id_1,xs_1,ys_1,a_1 = pnts_IdShape(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
id_2,xs_2,ys_2,a_2 = alternate(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
1,000,000分的计时结果如下
Timing results for 1,000,000 points are as follows
Timing function for... pnts_IdShape
Function... **pnts_IdShape**
args.... ()
kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... pnts_IdShape
time taken ... **0.680652857 sec**.
Timing function for... **alternate**
Function... alternate
args.... ()
kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... alternate
time taken ... **0.060056925 sec**.
推荐答案
有两种填充结构化数组的方法(< a href = http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays rel = nofollow> http://docs.scipy.org/doc/ numpy / user / basics.rec.html#filling-structured-arrays )-按行(或包含元组列表的行)和按字段。
There are 2 ways of filling a structured array (http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) - by row (or rows with list of tuples), and by field.
要按字段执行此操作,请创建一个空的结构化数组,然后按字段名称分配值
To do this by field, create the empty structured array, and assign values by field name
In [19]: a=np.column_stack((id,x,y))
# same as your vstack().T
In [20]: Y=np.zeros(a.shape[0], dtype=dt)
# empty, ones, etc
In [21]: Y['ID'] = a[:,0]
In [22]: Y['Shape'] = a[:,1:]
# (2,) field takes a 2 column array
In [23]: Y
Out[23]:
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
(4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
(8, [6.0, 1.0]), (9, [6.0, 6.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
在表面上
arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
看起来是构建元组列表的一种好方法数组。但是结果重复了 x
的值,而不是使用 y
。我必须看看哪里出了问题。
looks like an ok way of constructing the list of tuples need to fill the array. But result duplicates the values of x
instead of using y
. I'll have to look at what is wrong.
如果以下情况,您可以查看 a
这样的数组: dtype
是兼容的-3个int列的数据缓冲区的布局与具有3个int字段的数据缓冲区的布局相同。
You can take a view of an array like a
if the dtype
is compatible - the data buffer for 3 int columns is layed out the same way as one with 3 int fields.
a.view('i4,i4,i4')
但是您的dtype需要'i4,f8,f8',混合使用4和8个字节的字段,以及混合使用int和float。 a
缓冲区必须进行转换才能实现。 视图
无法做到。 (甚至不用问.astype。)
But your dtype wants 'i4,f8,f8', a mix of 4 and 8 byte fields, and a mix of int and float. The a
buffer will have to be transformed to achieve that. view
can't do it. (don't even ask about .astype.)
更正的元组方法列表:
In [35]: np.array([(i,j) for i,j in zip(id,np.column_stack((x,y)))],dt)
Out[35]:
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
(4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
(8, [6.0, 1.0]), (9, [6.0, 6.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
列表理解产生类似这样的列表:
The list comprehension produces a list like:
[(0, array([8, 8])),
(1, array([8, 0])),
(2, array([6, 2])),
....]
对于列表中的每个元组, [0]
进入dtype的第一个字段,而 [1]
(一个小数组),排在第二位。
For each tuple in the list, the [0]
goes in the first field of the dtype, and [1]
(a small array), goes in the 2nd.
元组也可以用
[(i,[j,k]) for i,j,k in zip(id,x,y)]
dt1 = np.dtype([('ID','<i4'),('Shape',('<i4',(2,)))])
是与视图兼容的dtype (仍为3个整数)
is a view compatible dtype (still 3 integers)
In [42]: a.view(dtype=dt1)
Out[42]:
array([[(0, [8, 8])],
[(1, [8, 0])],
[(2, [6, 2])],
[(3, [8, 8])],
[(4, [3, 2])],
[(5, [6, 1])],
[(6, [5, 6])],
[(7, [7, 7])],
[(8, [6, 1])],
[(9, [6, 6])]],
dtype=[('ID', '<i4'), ('Shape', '<i4', (2,))])
这篇关于创建结构化数组的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!