将列表的numpy数组转换为numpy数组 [英] Convert a numpy array of lists to a numpy array

查看:572
本文介绍了将列表的numpy数组转换为numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据用dtype=object存储为numpy数组,我想提取列表的一列并将其转换为numpy数组.看来这是一个简单的问题,但是我发现解决此问题的唯一方法是将整个对象重铸为列表列表,然后将其重铸为numpy数组.还有更Python化的方法吗?

I have some data which is stored as a numpy array with dtype=object, and I would like to extract one column of lists and convert it to a numpy array. It seems like a simple problem, but the only way I've found to solve it is to recast the entire thing as a list of lists and then recast it as a numpy array. Is there a more pythonic approach?

import numpy as np

arr = np.array([[1, ['a', 'b', 'c']], [2, ['a', 'b', 'c']]], dtype=object)
arr = arr[:, 1]

print(arr)
# [['a', 'b', 'c'] ['a', 'b', 'c']]

type(arr)
# numpy.ndarray
type(arr[0])
# list

arr.shape
# (2,)

将数组重铸为dtype=str会引发ValueError,因为它试图将每个列表转换为字符串.

Recasting the array as dtype=str raises a ValueError since it is trying to convert each list to a string.

arr.astype(str)
# ValueError: setting an array element with a sequence

可以将整个数组重建为列表列表,然后将其强制转换为numpy数组,但这似乎是一种回旋方式.

It is possible to rebuild the entire array as a list of lists and then cast it as a numpy array, but this seems like a roundabout way.

arr_2 = np.array(list(arr))

type(arr_2)
# numpy.ndarray
type(arr_2[0])
# numpy.ndarray

arr_2.shape
# (2, 3)

有更好的方法吗?

推荐答案

尽管通过列表进行操作比通过vstack进行操作要快:

Though going by way of lists is faster than by way of vstack:

In [1617]: timeit np.array(arr[:,1].tolist())
...
100000 loops, best of 3: 11.5 µs per loop
In [1618]: timeit np.vstack(arr[:,1])
...
10000 loops, best of 3: 54.1 µs per loop

vstack正在执行:

np.concatenate([np.atleast_2d(a) for a in arr[:,1]],axis=0)

一些替代方法:

In [1627]: timeit np.array([a for a in arr[:,1]])
100000 loops, best of 3: 18.6 µs per loop
In [1629]: timeit np.stack(arr[:,1],axis=0)
10000 loops, best of 3: 60.2 µs per loop

请记住,对象数组仅包含指向列表的指针,这些列表位于内存中.虽然arr的2d性质使选择第二列变得容易,但是arr[:,1]实际上是列表列表.而且对其进行的大多数操作都将其视为此类.像reshape这样的东西不会越过object边界.

Keep in mind that the object array just contains pointers to the lists which are else where in memory. While the 2d nature of arr makes it easy to select the 2nd column, arr[:,1] is effectively a list of lists. And most operations on it treat it as such. Things like reshape don't cross that object boundary.

这篇关于将列表的numpy数组转换为numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆