Python 3:如何指定stdin编码 [英] Python 3: How to specify stdin encoding

查看:439
本文介绍了Python 3:如何指定stdin编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将代码从Python 2移植到Python 3的同时,当从标准输入读取UTF-8文本时遇到此问题。在Python 2中,这样做很好:

 在sys.stdin中的行:
...

但是,Python 3希望从 sys.stdin 中的ASCII,如果有非ASCII字符在输入中,我得到错误:


UnicodeDecodeError:'ascii'编解码器无法解码字节..在位置..:ordinal不在范围(128)


对于常规文件,我打开文件时将指定编码:

  with open('filename','r',encoding ='utf-8')as file:
for line in file:
...

但是如何指定标准输入的编码?其他SO帖子建议使用

  input_stream = codecs.getreader('utf-8')(sys.stdin)
for input in input_stream:
...

但是,这不工作Python 3.我仍然收到相同的错误消息。我使用Ubuntu 12.04.2,我的区域设置设置为en_US.UTF-8。

解决方案

Python 3 >不期望从 sys.stdin 中的ASCII。它将以文本模式打开 stdin ,并对使用的编码做出有根据的猜测。这个猜测可能会下降到 ASCII ,但这不是给定的。请参阅 sys.stdin 文档如何选择编解码器。



像在文本模式下打开的其他文件对象一样, sys.stdin object来源于 io.TextIOBase 基类;它有一个 .buffer 属性指向底层缓冲的IO实例(依次具有一个 .raw 属性)。



将新的 sys.stdin.buffer python.org/3/library/io.html#io.TextIOWrapperrel =noreferrer> io.TextIOWrapper()实例指定一个不同的编码:

  import io 
import sys

input_stream = io.TextIOWrapper(sys.stdin .buffer,encoding ='utf-8')

或者,设置 PYTHONIOENCODING 环境变量到所需的运行python时的编解码器。


While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

for line in sys.stdin:
    ...

But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

For a regular file, I would specify the encoding when opening the file:

with open('filename', 'r', encoding='utf-8') as file:
    for line in file:
        ...

But how can I specify the encoding for standard input? Other SO posts have suggested using

input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:
    ...

However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

解决方案

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io
import sys

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

这篇关于Python 3:如何指定stdin编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆