将numpy unicode数组写入文本文件 [英] Write numpy unicode array to a text file
问题描述
我正在尝试将包含unicode元素的numpy数组导出到文本文件.
I'm trying to export a numpy array that contains unicode elements to a text file.
到目前为止,我可以执行以下操作,但没有任何Unicode字符:
So far I got the following to work, but doesn't have any unicode character:
import numpy as np
array_unicode=np.array([u'maca' u'banana',u'morango'])
with open('array_unicode.txt','wb') as f:
np.savetxt(f,array_unicode,fmt='%s')
如果我将'c'从'maca'更改为'ç',则会收到错误消息:
If I change 'c' from 'maca' to 'ç' I get an error:
import numpy as np
array_unicode=np.array([u'maça' u'banana',u'morango'])
with open('array_unicode.txt','wb') as f:
np.savetxt(f,array_unicode,fmt='%s')
跟踪:
Traceback (most recent call last):
File "<ipython-input-48-24ff7992bd4c>", line 8, in <module>
np.savetxt(f,array_unicode,fmt='%s')
File "C:\Anaconda2\lib\site-packages\numpy\lib\npyio.py", line 1158, in savetxt
fh.write(asbytes(format % tuple(row) + newline))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)
如何从numpy中设置savetxt
来编写unicode字符?
How can I set savetxt
from numpy to write unicode characters?
推荐答案
在Python3(ipthon-qt
终端)中,我可以这样做:
In Python3 (ipthon-qt
terminal) I can do:
In [12]: b=[u'maça', u'banana',u'morango']
In [13]: np.savetxt('test.txt',b,fmt='%s')
In [14]: cat test.txt
ma�a
banana
morango
In [15]: with open('test1.txt','w') as f:
...: for l in b:
...: f.write('%s\n'%l)
...:
In [16]: cat test1.txt
maça
banana
morango
Py2和3中的
savetxt
都坚持以'wb'字节模式保存.您的错误行具有asbytes
函数.
savetxt
in both Py2 and 3 insists on saving in 'wb', byte mode. Your error line has that asbytes
function.
在我的示例中,b
是一个列表,但这没关系.
In my example b
is a list, but that doesn't matter.
In [17]: c=np.array(['maça', 'banana','morango'])
In [18]: c
Out[18]:
array(['maça', 'banana', 'morango'],
dtype='<U7')
写入相同的内容.在py3中,默认的字符串类型是unicode,因此不需要u
标记-可以.
writes the same. In py3 the default string type is unicode, so the u
tag isn't needed - but is ok.
在Python2中,我用简单的写法得到了错误
In Python2 I get your error with a plain write
>>> b=[u'maça' u'banana',u'morango']
>>> with open('test.txt','w') as f:
... for l in b:
... f.write('%s\n'%l)
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)
添加encode
可以得到很好的输出:
adding the encode
gives a nice output:
>>> b=[u'maça', u'banana',u'morango']
>>> with open('test.txt','w') as f:
... for l in b:
... f.write('%s\n'%l.encode('utf-8'))
0729:~/mypy$ cat test.txt
maça
banana
morango
encode
是字符串方法,因此必须应用于数组(或列表)的各个元素.
encode
is a string method, so has to be applied to the individual elements of an array (or list).
回到py3端,如果我使用encode
我会得到:
Back on the py3 side, if I use the encode
I get:
In [26]: c1=np.array([l.encode('utf-8') for l in b])
In [27]: c1
Out[27]:
array([b'ma\xc3\xa7a', b'banana', b'morango'],
dtype='|S7')
In [28]: np.savetxt('test.txt',c1,fmt='%s')
In [29]: cat test.txt
b'ma\xc3\xa7a'
b'banana'
b'morango'
但是使用正确的格式,普通的写法可以正常工作:
but with the correct format, the plain write works:
In [33]: with open('test1.txt','wb') as f:
...: for l in c1:
...: f.write(b'%s\n'%l)
...:
In [34]: cat test1.txt
maça
banana
morango
混合unicode和2代Python的乐趣就很大.
Such are the joys of mixing unicode and the 2 Python generations.
如果有帮助,这是np.savetxt
使用的np.lib.npyio.asbytes
函数的代码(以及wb
文件模式):
In case it helps, here's the code for the np.lib.npyio.asbytes
function that np.savetxt
uses (along with the wb
file mode):
def asbytes(s): # py3?
if isinstance(s, bytes):
return s
return str(s).encode('latin1')
(请注意,编码固定为"latin1").
(note the encoding is fixed as 'latin1').
np.char
库将各种字符串方法应用于numpy数组的元素,因此np.array([x.encode...])
可以表示为:
The np.char
library applies a variety of string methods to the elements of a numpy array, so the np.array([x.encode...])
can be expressed as:
In [50]: np.char.encode(b,'utf-8')
Out[50]:
array([b'ma\xc3\xa7a', b'banana', b'morango'],
dtype='|S7')
这可能很方便,尽管过去的测试表明它不能节省时间.仍然必须将Python方法应用于每个元素.
This can be convenient, though past testing indicates that it is not a time saver. It still has to apply the Python method to each element.
这篇关于将numpy unicode数组写入文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!