将字符串字段的numpy数组转换为数字格式 [英] converting numpy array of string fields to numerical format
问题描述
我有一个字符串数组,分为三个字段:
I have an array of strings grouped into three fields:
x = np.array([(-1, 0, 1),
(-1, 1, 0),
(0, 1, -1),
(0, -1, 1)],
dtype=[('a', 'S2'),
('b', 'S2'),
('c', 'S2')])
我想转换为形状为4x3的数字数组(优先选择,类型为np.int8
,但不是必需的),而不是字段.
I would like to convert to a numerical array (of type np.int8
for a preference, but not required), shaped 4x3, instead of the fields.
我的一般方法是将其转换为类型为"S2"的4x3数组,然后使用astype
使其成为数字.唯一的问题是,我能想到的唯一方法同时涉及到view
和np.lib.stride_tricks.as_strided
,这似乎不是一个非常可靠的解决方案:
My general approach is to transform into a 4x3 array of type 'S2', then use astype
to make it numerical. The only problem is that the only approach I can think of involves both view
and np.lib.stride_tricks.as_strided
, which doesn't seem like a very robust solution:
y = np.lib.stride_tricks.as_strided(x.view(dtype='S2'),
shape=(4, 3), strides=(6, 2))
z = y.astype(np.int8)
这适用于此处所示的玩具盒,但是我觉得必须有一种更简单的方法来解压缩所有具有相同dtype的字段的数组.有什么更健壮的选择?
This works for the toy case shown here, but I feel like there must be a simpler way to unpack an array with fields all having the same dtype. What is a more robust alternative?
推荐答案
最新版本的numpy 1.16已添加structured_to_unstructured
来解决此问题:
The latest version of numpy 1.16 added structured_to_unstructured
which solves this purpose:
from numpy.lib.recfunctions import structured_to_unstructured
y = structured_to_unstructured(x) # 2d array of 'S2'
z = y.astype(np.int8)
在先前版本的numpy中,您可以结合使用x.data
和np.frombuffer
从内存中的相同数据创建另一个数组,而不必使用跨度.不过,由于计算是由S2
到int8
的转换驱动的,因此不会带来性能提升.
In previous version of numpy, you can combine x.data
and np.frombuffer
to create another array from the same data in memory without having to use strides. It doesn't bring performance gain though, as the computation is driven by the casting from S2
to int8
.
n = 1000
def f1(x):
y = np.lib.stride_tricks.as_strided(x.view(dtype='S2'),
shape=(n, 3),
strides=(6, 2))
return y.astype(np.int8)
def f2(x):
y = np.frombuffer(x.data, dtype='S2').reshape((n, 3))
return y.astype(np.int8)
x = np.array([(i%3-1, (i+1)%3-1, (i+2)%3-1)
for i in xrange(n)],
dtype='S2,S2,S2')
z1 = f1(x)
z2 = f2(x)
assert (z1==z2).all()
这篇关于将字符串字段的numpy数组转换为数字格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!