Python 3:如何指定标准输入编码 [英] Python 3: How to specify stdin encoding

查看:17
本文介绍了Python 3:如何指定标准输入编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将代码从 Python 2 移植到 Python 3 时,我在从标准输入读取 UTF-8 文本时遇到了这个问题.在 Python 2 中,这可以正常工作:

While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

for line in sys.stdin:
    ...

但是 Python 3 需要 sys.stdin 的 ASCII,如果输入中有非 ASCII 字符,我会得到错误:

But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

对于普通文件,我会在打开文件时指定编码:

For a regular file, I would specify the encoding when opening the file:

with open('filename', 'r', encoding='utf-8') as file:
    for line in file:
        ...

但是如何为标准输入指定编码?其他 SO 帖子(例如 如何更改 python 上的 stdin 编码) 建议使用

But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using

input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:
    ...

但是,这在 Python 3 中不起作用.我仍然收到相同的错误消息.我使用的是 Ubuntu 12.04.2,我的语言环境设置为 en_US.UTF-8.

However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

推荐答案

Python 3 期望来自 sys.stdin 的 ASCII.它将以文本模式打开 stdin 并对使用的编码进行有根据的猜测.这种猜测可能归结为 ASCII,但这不是给定的.请参阅sys.stdin 文档关于如何选择编解码器.

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

与以文本模式打开的其他文件对象一样,sys.stdin 对象源自 io.TextIOBase 基类;它有一个 .buffer 属性指向底层缓冲 IO 实例(它又具有一个 .raw 属性).

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

sys.stdin.buffer 属性包装在新的 io.TextIOWrapper() 实例 指定不同的编码:

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io
import sys

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

或者,设置 PYTHONIOENCODING 环境变量 到运行 python 时所需的编解码器.

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

从 Python 3.7 开始,您还可以重新配置现有的std* 包装器,前提是您在开始时(在读取任何数据之前)执行此操作:

From Python 3.7 onwards, you can also reconfigure the existing std* wrappers, provided you do it at the start (before any data has been read):

# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')

这篇关于Python 3:如何指定标准输入编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆