如何将不同类型的列插入numpy数组? [英] How to insert column of different type to numpy array?

查看:237
本文介绍了如何将不同类型的列插入numpy数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将两个np.datetime64int类型的numpy数组追加到另一个数组.

I would like to append two numpy arrays of type np.datetime64 and int to another.

这会导致错误.我该怎么做才能纠正这个问题?

This leads to an error. What do I have to do to correct this?

如果我将向量附加到自身上(即:np.append(c,c,axis=1)np.append(a,a,axis=1))

It works without error, if I append the vectors to itself (i. e.: np.append(c,c,axis=1) or np.append(a,a,axis=1))

numpy版本:1.14.3

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
c
Out[2]: 
array([[0],
       [1],
       [2],
       [3],
       [4]])
d = np.append(c,a,axis=1)
Traceback (most recent call last):
  File "/home/claudia/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-10548a83d1a2>", line 1, in <module>
    d = np.append(c,a,axis=1)
  File "/home/claudia/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 5166, in append
    return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion

推荐答案

可能最简单-使用熊猫DataFrame而不是数组

说实话,虽然可以使Numpy数组与异类列一起使用,但在这种情况下,它们可能并不是大多数用户实际需要的.在许多情况下,最好使用 Pandas DataFrame .这是将两列转换为名为dfDataFrame的方法:

Probably easiest - work with a Pandas DataFrame instead of an array

Truthfully, while Numpy arrays can be made to work with heterogenous columns, they may not be what most users actually need in this case. For many use cases, you may be better off using a Pandas DataFrame. Here's how to convert your two columns to a DataFrame called df:

import numpy as np
import pandas as pd

a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)


df = pd.DataFrame(dict(date=a.ravel(), val=c.ravel()))
print(df)
# output:
#                      date  val
#     0 2018-04-01 15:30:00    0
#     1 2018-04-01 15:31:00    1
#     2 2018-04-01 15:32:00    2
#     3 2018-04-01 15:33:00    3
#     4 2018-04-01 15:34:00    4

然后您可以像这样处理每个列:

You can then work with each of your columns like so:

print(df['date'])
# output:
#     0   2018-04-01 15:30:00
#     1   2018-04-01 15:31:00
#     2   2018-04-01 15:32:00
#     3   2018-04-01 15:33:00
#     4   2018-04-01 15:34:00
#     Name: date, dtype: datetime64[ns]

DataFrame对象提供了大量的方法,使分析此类数据非常容易.有关更多信息,请参见 Pandas文档(或此网站上的其他质量检查)有关DataFrame对象的信息.

DataFrame objects provide a ton of methods that make it pretty easy to analyze this kind of data. See the Pandas docs (or other QAs on this site) for more info about DataFrame objects.

通常,如果可以的话,应避免使用dtype=object数组.它们会导致许多基本的Numpy操作(例如算术运算,例如arr0 + arr1)出现性能问题,并且它们可能会以您意想不到的方式运行.

Generally, you should avoid arrays of dtype=object if you can. They cause performance issues with many of the basic Numpy operations (such as arithmetic, eg arr0 + arr1), and they may behave in ways you don't expect.

更好的仅Numpy解决方案是结构化数组.这些数组具有一个复合dtype,每个字段一个部分(为了便于讨论,尽管您

A better Numpy only solution is structured arrays. These arrays have a compound dtype, with one part per field (for the sake of this discussion, "field" is equivalent to "column", though you can do more interesting things with fields). Given your a and c arrays, here's how you can create a structured array:

# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))

# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)

# populate the structured array with the data from your column arrays
struct['date'], struct['val'] = a.T, c.T

print(struct)
# output:
#     array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
#            ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
#            ('2018-04-01T15:34:00', 4)],
#           dtype=[('date', '<M8[s]'), ('val', '<i8')])

然后,您可以通过使用特定列的名称对其进行索引来访问特定列(就像使用DataFrame一样):

You can then access the specific columns by indexing them with their name (just like you could with the DataFrame):

print(struct['date'])
# output:
#     ['2018-04-01T15:30:00' '2018-04-01T15:31:00' '2018-04-01T15:32:00'
#      '2018-04-01T15:33:00' '2018-04-01T15:34:00']

结构化数组陷阱

例如,您不能添加两个结构化数组:

Structured array pitfalls

You can't, for example, add two structured arrays:

# doesn't work
struct0 + struct1

但是您可以添加两个结构化数组的字段:

but you can add the fields of two structured arrays:

# works great
struct0['val'] + struct1['val']

通常,这些字段的行为就像标准的Numpy数组一样.

In general, the fields behave just like standard Numpy arrays.

这篇关于如何将不同类型的列插入numpy数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆