使用numpy loadtxt时添加了'b'字符 [英] 'b' character added when using numpy loadtxt
问题描述
我试图从文本文件创建一个数组.
我早些时候看到numpy有方法loadtxt
,所以我尝试了一下,但是它在每行之前添加了一些垃圾字符...
I tried to create an array from a text file.
I saw earlier that numpy had a method loadtxt
, so I try it, but it add some junk character before each row...
# my txt file
.--``--.
.--` `--.
| |
| |
`--. .--`
`--..--`
# my python v3.4 program
import numpy as np
f = open('tile', 'r')
a = np.loadtxt(f, dtype=str, delimiter='\n')
print(a)
# my print output
["b' .--``--. '"
"b'.--` `--.'"
"b'| |'"
"b'| |'"
"b'`--. .--`'"
"b' `--..--` '"]
这些'b'和双引号是什么?它们来自哪里? 我尝试了一些从互联网上选择的解决方案,例如使用编解码器打开文件,通过'S20','S11'更改dtype,以及许多其他无效的方法... 我期望的是一个unicode字符串数组,看起来像这样:
What are these 'b' and double quotes ? And where do they come from ? I tried some solution picked from internet, like open the file with codecs, change the dtype by 'S20', 'S11', and a lot of other things which don't work... What I expect is an array of unicode strings which look like this :
[[' .--``--. ']
['.--` `--.']
['| |']
['| |']
['`--. .--`']
[' `--..--` ']]
信息: 我正在使用来自debian稳定存储库的python 3.4和numpy
Info: I'm using python 3.4 and numpy from the debian stable repository
推荐答案
np.loadtxt
和np.genfromtxt
以字节模式运行,这是Python 2中的默认字符串类型.但是Python 3使用unicode,并以此标记字节字符串. b
.
np.loadtxt
and np.genfromtxt
operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b
.
我在python3 ipython
会话中尝试了一些变体:
I tried some variations, in an python3 ipython
session:
In [508]: np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[508]: b' .--``--.'
In [509]: np.loadtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[509]: "b' .--``--.'"
...
In [511]: np.genfromtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[511]: '.--``--.'
In [512]: np.genfromtxt('stack33655641.txt',dtype=None,delimiter='\n')[0]
Out[512]: b'.--``--.'
In [513]: np.genfromtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[513]: b'.--``--.'
genfromtxt
和dtype=str
给出最清晰的显示-除了会去除空白.我可能必须使用转换器将其关闭.这些功能用于读取csv
数据,其中(空格)是分隔符,而不是数据的一部分.
genfromtxt
with dtype=str
gives the cleanest display - except it strips blanks. I may have to use a converter to turn that off. These functions are meant to read csv
data where (white)spaces are separators, not part of the data.
loadtxt
和genfromtxt
过度杀伤.普通文件读取效果很好:
loadtxt
and genfromtxt
are over kill for simple text like this. A plain file read does nicely:
In [527]: with open('stack33655641.txt') as f:a=f.read()
In [528]: print(a)
.--``--.
.--` `--.
| |
| |
`--. .--`
`--..--`
In [530]: a=a.splitlines()
In [531]: a
Out[531]:
[' .--``--.',
'.--` `--.',
'| |',
'| |',
'`--. .--`',
' `--..--`']
(我的文本编辑器已设置为去除尾随空白,从而去除参差不齐的线条).
(my text editor is set to strip trailing blanks, hence the ragged lines).
@DSM's
建议:
In [556]: a=np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n').astype(str)
In [557]: a
Out[557]:
array([' .--``--.', '.--` `--.', '| |',
'| |', '`--. .--`', ' `--..--`'],
dtype='<U16')
In [558]: a.tolist()
Out[558]:
[' .--``--.',
'.--` `--.',
'| |',
'| |',
'`--. .--`',
' `--..--`']
这篇关于使用numpy loadtxt时添加了'b'字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!