试图从我的Numpy数组中剥离b'' [英] Trying to strip b' ' from my Numpy array
问题描述
所以我有一个非常愚蠢的问题.
So I have what I feel is a very dumb problem.
我从文件创建一个数组:
I create an array from a file:
A1=np.loadtxt(file, dtype='a100')
在处理完另一个文件后,我想将该数组写入:
I want to write that array after it's done processing to another file:
np.savetxt("Test.txt", A1, fmt=%s, delimiter=',')
为什么要写出b'string'?我想我知道它是以字节为单位写出来的,但是对于我一生来说,如果没有b'',我不知道如何写出来.
Why is it writing out b'string'? I think I understand it's writing it out as byte but for the life of me I can't figure out how to write it out without the b''.
我知道这很容易被我忽略!
I know this is probably something incredibly easy I'm overlooking!
推荐答案
A1
作为字节字符串数组加载. Python3默认使用unicode字符串,因此通常在它们前面加上'b'. print
正常.我在文件写入过程中也这样做感到有些惊讶.
A1
is loaded as an array of bytestrings. Python3 used unicode strings as default, so usually prepends them with the 'b'. That's normal with print
. I'm a little surprised that it does so also during the file write.
无论如何,这似乎可以解决问题:
In any case, this seems to do the trick:
A2=np.array([x.decode() for x in A1])
np.savetxt("Test.txt", A2, fmt='%s', delimiter=',')
A2
将具有与dtype='<U100'
类似的dtype.
A2
will have a dtype like dtype='<U100'
.
我的测试数组是:
array([b'one.com', b'two.url', b'three.four'], dtype='|S10')
从一个简单的文本文件加载:
loaded from a simple text file:
one.com
two.url
three.four
.decode
是字符串方法. [x.decode() for x in A1]
适用于简单的一维字节串数组.如果A1
为2d,则必须对所有元素(而不仅仅是行)进行迭代.如果A1
是结构化数组,则必须将A1
应用于元素内的字符串.
.decode
is a string method. [x.decode() for x in A1]
works for a simple 1d array of bytestrings. If A1
is 2d, the iteration has to be done over all elements, not just the rows. And if A1
is structured array, is has to be applied to the strings within the elements.
另一种可能性是在加载过程中使用转换器,这样您将获得(unicode)字符串数组
Another possibility is to use a converter during load, so you get an array of (unicode) strings
In [508]: A1=np.loadtxt('urls.txt', dtype='U',
converters={0:lambda x:x.decode()})
In [509]: A1
Out[509]:
array(['one.com', 'two.url', 'three.four'], dtype='<U10')
In [510]: np.savetxt('test0.txt',A1,fmt='%s')
In [511]: cat test0.txt
one.com
two.url
three.four
包含loadtxt
的库具有几个转换器函数asbytes
,asbytes_nested
和asstr
.因此converters
也可能是:converters={0:np.lib.npyio.asstr}
.
The lib that contains loadtxt
has a couple of converter functions, asbytes
, asbytes_nested
, and asstr
. So converters
could also be: converters={0:np.lib.npyio.asstr}
.
genfromtxt
在不使用converters
的情况下进行处理:
genfromtxt
handles this without converters
:
A1=np.genfromtxt('urls.txt', dtype='U')
# array(['one.com', 'two.url', 'three.four'], dtype='<U10')
要了解为什么savetxt
根据需要保存unicode字符串,而将b
附加为字节字符串,我们必须查看其代码.
To understand why savetxt
save unicode strings as we want, but appends the b
for bytestrings, we have to look at its code.
np.savetxt
(在py3上运行)本质上是:
np.savetxt
(running on py3) is essentially:
fh = open(fname, 'wb')
X = np.atleast_2d(X).T
# make a 'fmt' that matches the columns of X (with delimiters)
for row in X:
fh.write(asbytes(format % tuple(row) + newline))
查看两个示例字符串(str和bytestr):
Looking at two sample strings (str and bytestr):
In [617]: asbytes('%s'%tuple(['one.two']))
Out[617]: b'one.two'
In [618]: asbytes('%s'%tuple([b'one.two']))
Out[618]: b"b'one.two'"
写入'wb'文件将删除b''
的外层,而将内部保留为字节串.它还解释了为什么将字符串("plain" py3 unicode)作为"latin1"字符串写入文件.
Writing to a 'wb' file removes that outer layer of b''
, leaving the inner for the bytestring. It also explains why strings ('plain' py3 unicode) are written as 'latin1' strings to the file.
您可以直接编写一个字节字符串数组,而无需savetxt
.例如:
You could write a bytestrings array directly, without savetxt
. For example:
A0 = array([b'one.com', b'two.url', b'three.four'], dtype='|S10')
with open('test0.txt','wb') as f:
for x in A0:
f.write(x+b'\n')
cat test0.txt
one.com
two.url
three.four
Unicode字符串也可以直接编写,从而产生相同的文件:
Unicode strings can also be written directly, producing the same file:
A1 = array(['one.com', 'two.url', 'three.four'], dtype='<U10')
with open('test1.txt','w') as f:
for x in A1:
f.write(x+'\n')
此类文件的默认编码为encoding='UTF-8'
,与'one.com'.encode()
所使用的相同.效果与savetxt
相同:
The default encoding for such a file is encoding='UTF-8'
, the same as used with 'one.com'.encode()
. The effect it is the same as what savetxt
does:
with open('test1.txt','wb') as f:
for x in A1:
f.write(x.encode()+b'\n')
np.char
具有.encode
和.decode
方法,它们似乎在数组的元素上进行迭代操作.
np.char
has .encode
and .decode
methods, which appear to operate iteratively on the elements of an array.
因此
np.char.decode(A1) # convert |S10 to <U10, like [x.decode() for x in A1]
np.char.encode(A1) # convert <U10 to |S10
这适用于多维数组
np.savetxt('testm.txt',np.char.decode(A_bytes[:,None][:,[0,0]]),
fmt='%s',delimiter=', ')
对于结构化数组,必须分别将np.char.decode
应用于每个char字段.
With a structured array, np.char.decode
has to be applied individually to each of the char fields.
这篇关于试图从我的Numpy数组中剥离b''的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!