如何修复编码迁移 Python 子进程到 unicode_literals? [英] How to fix an encoding migrating Python subprocess to unicode_literals?

查看:40
本文介绍了如何修复编码迁移 Python 子进程到 unicode_literals?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正准备迁移到 Python 3.4 并添加了 unicode_literals.我们的代码广泛依赖于使用 subprocess 模块进出外部实用程序的管道.以下代码片段在 Python 2.7 上运行良好,可将 UTF-8 字符串通过管道传输到子进程:

kw = {}kw[u'stdin'] = subprocess.PIPEkw[u'stdout'] = subprocess.PIPEkw[u'stderr'] = subprocess.PIPEkw[u'executable'] = u'/path/to/binary/utility'args = [u'', u'-l', u'nl']行 = u'¡Basta Ya!'popen = subprocess.Popen(args,**kw)popen.stdin.write('%s\n' % line.encode(u'utf-8'))……呜呜呜……

以下更改会引发此错误:

from __future__ import unicode_literals千瓦 = {}kw[u'stdin'] = subprocess.PIPEkw[u'stdout'] = subprocess.PIPEkw[u'stderr'] = subprocess.PIPEkw[u'executable'] = u'/path/to/binary/utility'args = [u'', u'-l', u'nl']行 = u'¡Basta Ya!'popen = subprocess.Popen(args,**kw)popen.stdin.write('%s\n' % line.encode(u'utf-8'))回溯(最近一次调用最后一次):文件test.py",第 138 行,在 <module> 中.退出代码 = main()文件test.py",第 57 行,在主目录中popen.stdin.write('%s\n' % line.encode('utf-8'))UnicodeDecodeError: 'ascii' 编解码器无法解码位置 0 中的字节 0xc2:序号不在范围内 (128)

有什么建议可以通过管道传递 UTF-8 吗?

解决方案

'%s\n' 是使用 unicode_literals 时的 unicode 字符串:

<预><代码>>>>行 = u'¡Basta Ya!'>>>'%s\n' % line.encode(u'utf-8')'\xc2\xa1Basta Ya!\n'>>>u'%s\n' % line.encode(u'utf-8')回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中UnicodeDecodeError: 'ascii' 编解码器无法解码位置 0 中的字节 0xc2:序号不在范围内 (128)

发生的情况是您编码的 line 值正在被解码以插入到 unicode '%s\n' 字符串中.

你必须使用字节串;用 b 前缀字符串:

<预><代码>>>>从 __future__ 导入 unicode_literals>>>行 = u'¡Basta Ya!'>>>b'%s\n' % line.encode(u'utf-8')'\xc2\xa1Basta Ya!\n'

或编码插值:

<预><代码>>>>行 = u'¡Basta Ya!'>>>('%s\n' % line).encode(u'utf-8')'\xc2\xa1Basta Ya!\n'

在 Python 3 中,无论如何您都必须将字节串写入管道.

We're preparing to move to Python 3.4 and added unicode_literals. Our code relies extensively on piping to/from external utilities using subprocess module. The following code snippet works fine on Python 2.7 to pipe UTF-8 strings to a sub-process:

kw = {}
kw[u'stdin'] = subprocess.PIPE
kw[u'stdout'] = subprocess.PIPE
kw[u'stderr'] = subprocess.PIPE
kw[u'executable'] = u'/path/to/binary/utility'
args = [u'', u'-l', u'nl']

line = u'¡Basta Ya!'

popen = subprocess.Popen(args,**kw)
popen.stdin.write('%s\n' % line.encode(u'utf-8'))
...blah blah...

The following changes throw this error:

from __future__ import unicode_literals

kw = {}
kw[u'stdin'] = subprocess.PIPE
kw[u'stdout'] = subprocess.PIPE
kw[u'stderr'] = subprocess.PIPE
kw[u'executable'] = u'/path/to/binary/utility'
args = [u'', u'-l', u'nl']

line = u'¡Basta Ya!'

popen = subprocess.Popen(args,**kw)
popen.stdin.write('%s\n' % line.encode(u'utf-8'))
Traceback (most recent call last):
  File "test.py", line 138, in <module>
    exitcode = main()
  File "test.py", line 57, in main
    popen.stdin.write('%s\n' % line.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

Any suggestions to pass UTF-8 through the pipe?

解决方案

'%s\n' is a unicode string when you use unicode_literals:

>>> line = u'¡Basta Ya!'
>>> '%s\n' % line.encode(u'utf-8')
'\xc2\xa1Basta Ya!\n'
>>> u'%s\n' % line.encode(u'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

What happens is that your encoded line value is being decoded to interpolate into the unicode '%s\n' string.

You'll have to use a byte string instead; prefix the string with b:

>>> from __future__ import unicode_literals
>>> line = u'¡Basta Ya!'
>>> b'%s\n' % line.encode(u'utf-8')
'\xc2\xa1Basta Ya!\n'

or encode after interpolation:

>>> line = u'¡Basta Ya!'
>>> ('%s\n' % line).encode(u'utf-8')
'\xc2\xa1Basta Ya!\n'

In Python 3, you'll have to write bytestrings to pipes anyway.

这篇关于如何修复编码迁移 Python 子进程到 unicode_literals?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆