将numpy unicode数组写入文本文件 [英] Write numpy unicode array to a text file

查看:94
本文介绍了将numpy unicode数组写入文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将包含unicode元素的numpy数组导出到文本文件.

I'm trying to export a numpy array that contains unicode elements to a text file.

到目前为止,我可以执行以下操作,但没有任何Unicode字符:

So far I got the following to work, but doesn't have any unicode character:

import numpy as np

array_unicode=np.array([u'maca' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:
    np.savetxt(f,array_unicode,fmt='%s')

如果我将'c'从'maca'更改为'ç',则会收到错误消息:

If I change 'c' from 'maca' to 'ç' I get an error:

import numpy as np

array_unicode=np.array([u'maça' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:
    np.savetxt(f,array_unicode,fmt='%s')

跟踪:

Traceback (most recent call last):
  File "<ipython-input-48-24ff7992bd4c>", line 8, in <module>
    np.savetxt(f,array_unicode,fmt='%s')
  File "C:\Anaconda2\lib\site-packages\numpy\lib\npyio.py", line 1158, in savetxt
    fh.write(asbytes(format % tuple(row) + newline))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)

如何从numpy中设置savetxt来编写unicode字符?

How can I set savetxt from numpy to write unicode characters?

推荐答案

在Python3(ipthon-qt终端)中,我可以这样做:

In Python3 (ipthon-qt terminal) I can do:

In [12]: b=[u'maça', u'banana',u'morango']

In [13]: np.savetxt('test.txt',b,fmt='%s')

In [14]: cat test.txt
ma�a
banana
morango

In [15]: with open('test1.txt','w') as f:
    ...:     for l in b:
    ...:         f.write('%s\n'%l)
    ...:         

In [16]: cat test1.txt
maça
banana
morango

Py2和3中的

savetxt都坚持以'wb'字节模式保存.您的错误行具有asbytes函数.

savetxt in both Py2 and 3 insists on saving in 'wb', byte mode. Your error line has that asbytes function.

在我的示例中,b是一个列表,但这没关系.

In my example b is a list, but that doesn't matter.

In [17]: c=np.array(['maça', 'banana','morango'])

In [18]: c
Out[18]: 
array(['maça', 'banana', 'morango'], 
      dtype='<U7') 

写入相同的内容.在py3中,默认的字符串类型是unicode,因此不需要u标记-可以.

writes the same. In py3 the default string type is unicode, so the u tag isn't needed - but is ok.

在Python2中,我用简单的写法得到了错误

In Python2 I get your error with a plain write

>>> b=[u'maça' u'banana',u'morango']
>>> with open('test.txt','w') as f:
...    for l in b:
...        f.write('%s\n'%l)
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)

添加encode可以得到很好的输出:

adding the encode gives a nice output:

>>> b=[u'maça', u'banana',u'morango']
>>> with open('test.txt','w') as f:
...    for l in b:
...        f.write('%s\n'%l.encode('utf-8'))
0729:~/mypy$ cat test.txt
maça
banana
morango

encode是字符串方法,因此必须应用于数组(或列表)的各个元素.

encode is a string method, so has to be applied to the individual elements of an array (or list).

回到py3端,如果我使用encode我会得到:

Back on the py3 side, if I use the encode I get:

In [26]: c1=np.array([l.encode('utf-8') for l in b])

In [27]: c1
Out[27]: 
array([b'ma\xc3\xa7a', b'banana', b'morango'], 
      dtype='|S7')

In [28]: np.savetxt('test.txt',c1,fmt='%s')

In [29]: cat test.txt
b'ma\xc3\xa7a'
b'banana'
b'morango'

但是使用正确的格式,普通的写法可以正常工作:

but with the correct format, the plain write works:

In [33]: with open('test1.txt','wb') as f:
    ...:     for l in c1:
    ...:         f.write(b'%s\n'%l)
    ...:         

In [34]: cat test1.txt
maça
banana
morango

混合unicode和2代Python的乐趣就很大.

Such are the joys of mixing unicode and the 2 Python generations.

如果有帮助,这是np.savetxt使用的np.lib.npyio.asbytes函数的代码(以及wb文件模式):

In case it helps, here's the code for the np.lib.npyio.asbytes function that np.savetxt uses (along with the wb file mode):

def asbytes(s):    # py3?
    if isinstance(s, bytes):
        return s
    return str(s).encode('latin1')

(请注意,编码固定为"latin1").

(note the encoding is fixed as 'latin1').

np.char库将各种字符串方法应用于numpy数组的元素,因此np.array([x.encode...])可以表示为:

The np.char library applies a variety of string methods to the elements of a numpy array, so the np.array([x.encode...]) can be expressed as:

In [50]: np.char.encode(b,'utf-8')
Out[50]: 
array([b'ma\xc3\xa7a', b'banana', b'morango'], 
      dtype='|S7')

这可能很方便,尽管过去的测试表明它不能节省时间.仍然必须将Python方法应用于每个元素.

This can be convenient, though past testing indicates that it is not a time saver. It still has to apply the Python method to each element.

这篇关于将numpy unicode数组写入文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆