追加到异构numpy数组时出现"TypeError:无效类型提升" [英] `TypeError: invalid type promotion` when appending to a heterogeneous numpy array

查看:486
本文介绍了追加到异构numpy数组时出现"TypeError:无效类型提升"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用以下方法创建了一个数组:

I have created an array with:

Ticket_data = np.empty((0,7),
                       dtype='str,datetime64[m],datetime64[m],str,str,str,str')

并且我正在尝试附加以下数据:

and I am trying to append data with:

lineitem = [str(data[0][0]), OpenDT, CloseDT, str(data[0][11]),
            str(data[0][12]), str(data[0][13]), str(data[0][14])]

其中OpenDTCloseDT是使用np.datetime64(DTstring, 'm')

我遇到了错误:

Traceback (most recent call last):
  File "Daily Report.py", line 25, in <module>
    np.append(Ticket_data, np.array([lineitem]), axis=0)
  File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 3884, in append
    return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion



print np.array([lineitem])

输出

[['21539' '2015-06-30T10:46-0700' '2015-06-30T10:55-0700' 'Testtext'
 'Testtext2' 'Testtext3' 'Testtext5']]

print np.array([lineitem], dtype=Ticket_data.dtype)

输出

[[('', 245672259890L, datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', datetime.datetime(2015, 6, 30, 17, 46), datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', datetime.datetime(2015, 6, 30, 17, 55), datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', 7741528753124368710L, datetime.datetime(1982, 11, 21, 6, 33), '', '', '', '')
  ('', 7959953343691844691L, datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', datetime.datetime(5205, 7, 21, 7, 42), datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', 2336635297857499728L, 2338042681633169744L, '', '', '', '')]]

我该怎么解决?

推荐答案

首先,结构化数组中的 fields 与常规ndarray中的 dimensions 不同.您希望您的Ticket_label数组为一维,但是该维度中的每个行元素都包含 7个字段,例如:

Firstly, fields in a structured array are not the same thing as dimensions in a regular ndarray. You want your Ticket_label array to be 1-dimensional, but for each row element in that dimension to contain 7 fields, e.g.:

Ticket_data = np.empty((0,),
                       dtype='str,datetime64[m],datetime64[m],str,str,str,str')

现在,为了将lineitem连接到Ticket_data,必须首先将其从嵌套列表隐式转换为数组.由于您没有为每个字段指定单独的dtype,因此numpy将lineitem视为同质数组,并找到可以安全地提升为每个元素的公共dtype.

Now in order to concatenate lineitem to Ticket_data, it must first be implicitly cast from nested lists to an array. Since you don't specify separate dtypes for each field, numpy treats lineitem as a homogeneous array, and finds a common dtype that each element can be safely promoted to.

例如:

lineitem = ['foo', np.datetime64('1979-03-22T19:00', 'm'),
            np.datetime64('1979-03-22T19:00', 'm'), 'bar', 'baz', 'a', 'b']

np.array(lineitem)
# array(['21539', '2015-06-30T10:46-0700', '2015-06-30T10:55-0700',
#        'Testtext', 'Testtext2', 'Testtext3', 'Testtext5'], 
#       dtype='|S21')

在此示例中,每个元素都强制转换为21长的字符串.此数组的dtypeTicket_data的数组不匹配,并且由于没有将'|S21'强制转换为'np.datetime64[m]'的安全方法,因此会出现invalid type promotion错误.

In this example, every element is cast to a 21-long string. The dtype of this array does not match that of Ticket_data, and since there is no safe way to cast '|S21' to 'np.datetime64[m]' you get an invalid type promotion error.

您可以通过将lineitem显式转换为数组,并为每个字段指定正确的dtypes来避免该错误:

You could avoid the error by explicitly casting lineitem to an array, specifying the correct dtypes for each field:

np.array([tuple(lineitem)], dtype=Ticket_data.dtype)

请注意,我正在将lineitem强制转换为元组-为了将lineitem中的元素解释为单独的 fields 而不是单独的 elements ,这是必需的em>.结果是形状为(1,)(不是(1, 7))的数组:

Note that I'm casting lineitem to a tuple - this is necessary in order for the elements in lineitem to be interpreted as separate fields rather than separate elements. The result is an array of shape (1,) (not (1, 7)):

np.array([tuple(lineitem)], dtype=Ticket_data.dtype).shape
# (1,)

如果我lineitem转换为元组,那么我得到一个(1, 7)数组,其中lineitem中的每个 individual 元素都被解释为一个序列,导致您在编辑中显示的废话.

If I don't cast lineitem to a tuple then I get a (1, 7) array, where each individual element in lineitem is interpreted as a sequence of 'str,datetime64[m],datetime64[m],str,str,str,str', resulting in the nonsense you showed in your edit.

然后可以将结果连接到Ticket_label.

The result can then be concatenated to Ticket_label.

顺便说一句,我强烈建议使用 pandas 而不是结构化数组来处理此类异构数据

As an aside, I strongly recommend using pandas instead of structured arrays for dealing with heterogeneous data such as this.

这篇关于追加到异构numpy数组时出现"TypeError:无效类型提升"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆