将字符串字段的numpy数组转换为数字格式 [英] converting numpy array of string fields to numerical format

查看:3868
本文介绍了将字符串字段的numpy数组转换为数字格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串数组,分为三个字段:

I have an array of strings grouped into three fields:

x = np.array([(-1, 0, 1),
              (-1, 1, 0),
              (0, 1, -1),
              (0, -1, 1)],
             dtype=[('a', 'S2'),
                    ('b', 'S2'),
                    ('c', 'S2')])

我想转换为形状为4x3的数字数组(优先选择,类型为np.int8,但不是必需的),而不是字段.

I would like to convert to a numerical array (of type np.int8 for a preference, but not required), shaped 4x3, instead of the fields.

我的一般方法是将其转换为类型为"S2"的4x3数组,然后使用astype使其成为数字.唯一的问题是,我能想到的唯一方法同时涉及到viewnp.lib.stride_tricks.as_strided,这似乎不是一个非常可靠的解决方案:

My general approach is to transform into a 4x3 array of type 'S2', then use astype to make it numerical. The only problem is that the only approach I can think of involves both view and np.lib.stride_tricks.as_strided, which doesn't seem like a very robust solution:

y = np.lib.stride_tricks.as_strided(x.view(dtype='S2'),
                                    shape=(4, 3), strides=(6, 2))
z = y.astype(np.int8)

这适用于此处所示的玩具盒,但是我觉得必须有一种更简单的方法来解压缩所有具有相同dtype的字段的数组.有什么更健壮的选择?

This works for the toy case shown here, but I feel like there must be a simpler way to unpack an array with fields all having the same dtype. What is a more robust alternative?

推荐答案

最新版本的numpy 1.16已添加structured_to_unstructured来解决此问题:

The latest version of numpy 1.16 added structured_to_unstructured which solves this purpose:

from numpy.lib.recfunctions import structured_to_unstructured
y = structured_to_unstructured(x)  # 2d array of 'S2'
z = y.astype(np.int8)

在先前版本的numpy中,您可以结合使用x.datanp.frombuffer从内存中的相同数据创建另一个数组,而不必使用跨度.不过,由于计算是由S2int8的转换驱动的,因此不会带来性能提升.

In previous version of numpy, you can combine x.data and np.frombuffer to create another array from the same data in memory without having to use strides. It doesn't bring performance gain though, as the computation is driven by the casting from S2 to int8.

n = 1000

def f1(x):
    y = np.lib.stride_tricks.as_strided(x.view(dtype='S2'),
                                        shape=(n, 3),
                                        strides=(6, 2))
    return y.astype(np.int8)

def f2(x):
    y = np.frombuffer(x.data, dtype='S2').reshape((n, 3))
    return y.astype(np.int8)


x = np.array([(i%3-1, (i+1)%3-1, (i+2)%3-1)
              for i in xrange(n)],
             dtype='S2,S2,S2')

z1 = f1(x)
z2 = f2(x)
assert (z1==z2).all()

这篇关于将字符串字段的numpy数组转换为数字格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆