numpy dtype错误-(结构化数组创建) [英] numpy dtype error - (structured array creation)

查看:381
本文介绍了numpy dtype错误-(结构化数组创建)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解以下原因为何时遇到了麻烦:

I am having some trouble understanding why the following does not work:

np.dtype(dict(names="10", formats=np.float64))

我一直在为此苦苦挣扎,因为我想让numpy中的recfunctions函数正常工作,但是由于numpy.dtype的问题,我没有成功.这是我目前收到的错误:

I have been struggling with this because I would like to get the recfunctions function in numpy to work, but due to issues with the numpy.dtype, I haven't been successful. This is the error I am receiving at the moment:

dtype = np.dtype(dict(names=names, formats=formats))
ValueError: all items in the dictionary must have the same length.

我想获得一个数据结构,该结构将包含一个记录数组类型,每个分配的字段内都有多列数据-类似于字典,其中每个值是一个2d数组或几列数据.通常,数据最终可能是〜6列,每个键或记录的〜2000行,带有〜200条记录.

I want to get a data structure that will contain a type of record array with multiple columns of data within each assigned field - similar to a dictionary where each value is a 2d array or several columns of data. Typically the data may end up being ~6 columns, ~2000 rows for each key or record, with ~200 records.

这是我在完整脚本中尝试过的内容:(尽管仍然出现相同的错误)

Here is what I have tried in a complete script: (although still giving the same error)

import numpy as np
from numpy.lib import recfunctions


# Just function to make random data
def make_data(i, j):
    # some arbitrary function to show that the number of columns may change, but rows stay the same length
    if i%3==0:
        data = np.array([[i for i in range(0,1150)]*t for t in range(0,3)])
    else:
        data = np.array([[i for i in range(0,1150)]*t for t in range(0,6)])
    return data

def data_struct(low_ij, high_ij):

    """
    Data Structure to contain several columns of data for different combined values between "low ij" and "high ij"

    Key: "(i, j)"
    Value: numpy ndarray (multidimensional)
    """

    for i in range(0,low_ij+1):
        for j in range(0,high_ij+1):
            # Get rid of some of the combinations
            # (unimportant)
            if(i<low_ij and j<low_ij):
                break
            elif(i<j):
                break

            # Combinations of interest to create structure
            else:
                names = str(i)+str(j)
                formats = np.float64
                data = np.array(make_data(i, j))
                try:
                    data_struct = recfunctions.append_fields(base=data_struct, names=names, data=data, dtypes=formats)
                # First loop will assign data_struct using this exception,
                # then proceed to use the try statement to add on the rest of the data
                except UnboundLocalError:
                    dtype = np.dtype(dict(names=names, formats=formats))
                    data_struct = np.array(data, dtype=dtype)

    return data_struct

推荐答案

看起来您正在尝试构造类似以下内容的结构化数组:

Looks like you are trying to construct a structured array something like:

In [152]: names=['1','2','3','4']
In [153]: formats=[(float,2),(float,3),(float,2),(float,3)]
In [154]: dt=np.dtype({'names':names, 'formats':formats})
In [156]: ds=np.zeros(5, dtype=dt)

In [157]: ds
Out[157]: 
array([([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
       ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
       ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
       ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0]),
       ([0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0], [0.0, 0.0, 0.0])], 
      dtype=[('1', '<f8', (2,)), ('2', '<f8', (3,)), ('3', '<f8', (2,)), 
           ('4', '<f8', (3,))])
In [159]: ds['1']=np.arange(10).reshape(5,2)
In [160]: ds['2']=np.arange(15).reshape(5,3)

换句话说,是多个字段,每个字段具有不同数量的列"(形状).

In other words, multiple fields, each with a different number of 'columns' (shape).

在这里,我创建了对整个数组的初始化,然后分别填充字段.这似乎是创建复杂结构化数组的最直接方法.

Here I create initialize the whole array, and then fill the fields individually. That seems to be the most straight forward way of creating complex structured arrays.

您正在尝试从一个字段开始逐步构建这种数组,并使用recfunctions.append_fields

You are trying to build such an array incrementally, starting with one field, and adding new ones with recfunctions.append_fields

In [162]: i=1; 
   ds2 = np.array(np.arange(5),dtype=np.dtype({'names':[str(i)],'formats':[float]}))
In [164]: i+=1;
   ds2=recfunctions.append_fields(base=ds2,names=str(i),dtypes=float,
      data=np.arange(5), usemask=False,asrecarray=False)
In [165]: i+=1;
   ds2=recfunctions.append_fields(base=ds2,names=str(i),dtypes=float,
      data=np.arange(5), usemask=False,asrecarray=False)

In [166]: ds2
Out[166]: 
array(data = [(0.0, 0.0, 0.0) (1.0, 1.0, 1.0) (2.0, 2.0, 2.0) 
    (3.0, 3.0, 3.0) (4.0, 4.0, 4.0)], 
    dtype = [('1', '<f8'), ('2', '<f8'), ('3', '<f8')])

当所有附加字段都具有1个列"时,此方法有效.通过掩蔽,它们甚至可以具有不同数量的行".但是,当我尝试改变内部形状时,在附加字段时会遇到问题. marge_arrays不再成功.

This works when the appended fields all have 1 'column'. With the masking they can even have different numbers of 'rows'. But when I try to vary the internal shape it has problems appending the field. marge_arrays isn't any more successful.

即使我们可以使用增量式recfunctions方法,它也可能比初始化和填充方法要慢.即使您一开始不知道每个字段的形状,也可以将它们全部收集在字典中,然后从中组装数组.这种结构化数组没有字典那么紧凑或有效.它只是使某些样式的数据访问(跨字段)更加方便.

Even if we can get the incremental recfunctions approach to work, it probably will be slower than the initialize-and-fill approach. Even if you don't know the shape of each of the fields at the start, you could collect them all in a dictionary, and assemble the array from that. This kind of structured array isn't any more compact or efficient than a dictionary. It just makes certain styles of data access (across fields) more convenient.

这篇关于numpy dtype错误-(结构化数组创建)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆