使用numpy loadtxt时添加了'b'字符 [英] 'b' character added when using numpy loadtxt

查看:521
本文介绍了使用numpy loadtxt时添加了'b'字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从文本文件创建一个数组. 我早些时候看到numpy有方法loadtxt,所以我尝试了一下,但是它在每行之前添加了一些垃圾字符...

I tried to create an array from a text file. I saw earlier that numpy had a method loadtxt, so I try it, but it add some junk character before each row...

# my txt file

    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

# my python v3.4 program

import numpy as np
f = open('tile', 'r')
a = np.loadtxt(f, dtype=str, delimiter='\n')
print(a)

# my print output

["b'    .--``--.    '"
 "b'.--`        `--.'"
 "b'|              |'"
 "b'|              |'"
 "b'`--.        .--`'"
 "b'    `--..--`    '"]

这些'b'和双引号是什么?它们来自哪里? 我尝试了一些从互联网上选择的解决方案,例如使用编解码器打开文件,通过'S20','S11'更改dtype,以及许多其他无效的方法... 我期望的是一个unicode字符串数组,看起来像这样:

What are these 'b' and double quotes ? And where do they come from ? I tried some solution picked from internet, like open the file with codecs, change the dtype by 'S20', 'S11', and a lot of other things which don't work... What I expect is an array of unicode strings which look like this :

[['    .--``--.    ']
 ['.--`        `--.']
 ['|              |']
 ['|              |']
 ['`--.        .--`']
 ['    `--..--`    ']]

信息: 我正在使用来自debian稳定存储库的python 3.4和numpy

Info: I'm using python 3.4 and numpy from the debian stable repository

推荐答案

np.loadtxtnp.genfromtxt以字节模式运行,这是Python 2中的默认字符串类型.但是Python 3使用unicode,并以此标记字节字符串. b.

np.loadtxt and np.genfromtxt operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b.

我在python3 ipython会话中尝试了一些变体:

I tried some variations, in an python3 ipython session:

In [508]: np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[508]: b'    .--``--.'
In [509]: np.loadtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[509]: "b'    .--``--.'"
...
In [511]: np.genfromtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[511]: '.--``--.'
In [512]: np.genfromtxt('stack33655641.txt',dtype=None,delimiter='\n')[0]
Out[512]: b'.--``--.'
In [513]: np.genfromtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[513]: b'.--``--.'

genfromtxtdtype=str给出最清晰的显示-除了会去除空白.我可能必须使用转换器将其关闭.这些功能用于读取csv数据,其中(空格)是分隔符,而不是数据的一部分.

genfromtxt with dtype=str gives the cleanest display - except it strips blanks. I may have to use a converter to turn that off. These functions are meant to read csv data where (white)spaces are separators, not part of the data.

loadtxtgenfromtxt过度杀伤.普通文件读取效果很好:

loadtxt and genfromtxt are over kill for simple text like this. A plain file read does nicely:

In [527]: with open('stack33655641.txt') as f:a=f.read()
In [528]: print(a)
    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

In [530]: a=a.splitlines()
In [531]: a
Out[531]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']

(我的文本编辑器已设置为去除尾随空白,从而去除参差不齐的线条).

(my text editor is set to strip trailing blanks, hence the ragged lines).

@DSM's建议:

In [556]: a=np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n').astype(str)
In [557]: a
Out[557]: 
array(['    .--``--.', '.--`        `--.', '|              |',
       '|              |', '`--.        .--`', '    `--..--`'], 
      dtype='<U16')
In [558]: a.tolist()
Out[558]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']

这篇关于使用numpy loadtxt时添加了'b'字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆