了解NumPy对字符串数据类型的解释 [英] Understanding NumPy's interpretation of string data types

查看:466
本文介绍了了解NumPy对字符串数据类型的解释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说我有一个代表某些数据的字节对象,我想通过np.genfromtxt将其转换为numpy数组.我在理解这种情况下如何处理字符串时遇到了麻烦.让我们从以下内容开始:

Lets say I have a bytes object that represents some data, and I want to convert it to a numpy array via np.genfromtxt. I am having trouble understanding how I should handle strings in this case. Let's start with the following:

from io import BytesIO
import numpy as np

text = b'test, 5, 1.2'
types = ['str', 'i4', 'f4']
np.genfromtxt(BytesIO(text), delimiter = ',', dtype = types)

这不起作用.它引发

TypeError: data type not understood

如果我更改types以便types = ['c', 'i4', 'f4']

然后numpy调用返回

array((b't', 5, 1.2000000476837158), 
      dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<f4')])

这样就可以了,但是很明显,我只得到了字符串的第一个字母.

So it works, but I am only getting the first letter of the string, obviously.

如果我将c8c16用作test的dtype,那么我得到

If I use c8 or c16 for the dtype of test, then I get

array(((nan+0j), 5, 1.2000000476837158), 
      dtype=[('f0', '<c8'), ('f1', '<i4'), ('f2', '<f4')])

这是垃圾.我也尝试使用aU,但没有成功. genfromtxt如何将元素识别并保存为字符串?

which is garbage. I've also tried using a, and U, no success. How in the world do I get genfromtxt to recognize and save elements as a string?

我假设部分内容是这是一个bytes对象.但是,如果我改为使用普通字符串作为text,并使用StringIO而不是BytesIO,则genfromtxt会引发错误:

I assume part of the ssue is that this is a bytes object. However, if I instead use a normal string as text, and use StringIO rather than BytesIO, then genfromtxt raises an error:

TypeError: Can't convert字节object to str implicitly

推荐答案

在我的Python3会话中:

In my Python3 session:

In [568]: text = b'test, 5, 1.2'
# I don't need BytesIO since genfromtxt works with a list of
# byte strings, as from text.splitlines()

In [570]: np.genfromtxt([text], delimiter=',', dtype=None)
Out[570]: 
array((b'test', 5, 1.2), 
      dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])

如果留给自己的设备使用,则genfromtxt推导第一个字段应为S4-4个字节字符串字符.

If left to its own devices genfromtxt deduces that the 1st field should be S4 - 4 bytestring characters.

我也可以使用以下类型明确显示

I could also be explicit with the types:

In [571]: types=['S4', 'i4', 'f4']
In [572]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[572]: 
array((b'test', 5, 1.2000000476837158), 
      dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f4')])
In [573]: types=['S10', 'i', 'f']
In [574]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[574]: 
array((b'test', 5, 1.2000000476837158), 
      dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<f4')])

In [575]: types=['U10', 'int', 'float']
In [576]: np.genfromtxt([text],delimiter=',',dtype=types)
Out[576]: 
array(('test', 5, 1.2), 
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f8')])

我可以指定SU(unicode),但是我还必须指定长度.我认为genfromtxt没有办法让它推断出长度-除了None类型.我必须深入研究代码,看看它如何推导字符串长度.

I can specify either S or U (unicode), but I also have to specify the length. I don't think there's a way with genfromtxt to let it deduce the length - except for the None type. I'd have to dig into the code to see how it deduces the string length.

我还可以使用np.array创建此数组(通过将其变为子字符串元组,并提供正确的dtype:

I could also create this array with np.array (by making it a tuple of substrings, and giving a correct dtype:

In [599]: np.array(tuple(text.split(b',')), dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])
Out[599]: 
array((b'test', 5, 1.2), 
      dtype=[('f0', 'S4'), ('f1', '<i4'), ('f2', '<f8')])

这篇关于了解NumPy对字符串数据类型的解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆