Python 3:如何指定标准输入编码 [英] Python 3: How to specify stdin encoding
问题描述
在将代码从 Python 2 移植到 Python 3 时,我在从标准输入读取 UTF-8 文本时遇到了这个问题.在 Python 2 中,这可以正常工作:
While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:
for line in sys.stdin:
...
但是 Python 3 需要 sys.stdin 的 ASCII,如果输入中有非 ASCII 字符,我会得到错误:
But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:
UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)
UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)
对于普通文件,我会在打开文件时指定编码:
For a regular file, I would specify the encoding when opening the file:
with open('filename', 'r', encoding='utf-8') as file:
for line in file:
...
但是如何为标准输入指定编码?其他 SO 帖子(例如 如何更改 python 上的 stdin 编码) 建议使用
But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using
input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:
...
但是,这在 Python 3 中不起作用.我仍然收到相同的错误消息.我使用的是 Ubuntu 12.04.2,我的语言环境设置为 en_US.UTF-8.
However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.
推荐答案
Python 3 不期望来自 sys.stdin
的 ASCII.它将以文本模式打开 stdin
并对使用的编码进行有根据的猜测.这种猜测可能归结为 ASCII
,但这不是给定的.请参阅sys.stdin
文档关于如何选择编解码器.
Python 3 does not expect ASCII from sys.stdin
. It'll open stdin
in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII
, but that is not a given. See the sys.stdin
documentation on how the codec is selected.
与以文本模式打开的其他文件对象一样,sys.stdin
对象源自 io.TextIOBase
基类;它有一个 .buffer
属性指向底层缓冲 IO 实例(它又具有一个 .raw
属性).
Like other file objects opened in text mode, the sys.stdin
object derives from the io.TextIOBase
base class; it has a .buffer
attribute pointing to the underlying buffered IO instance (which in turn has a .raw
attribute).
将 sys.stdin.buffer
属性包装在新的 io.TextIOWrapper()
实例 指定不同的编码:
Wrap the sys.stdin.buffer
attribute in a new io.TextIOWrapper()
instance to specify a different encoding:
import io
import sys
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
或者,设置 PYTHONIOENCODING
环境变量 到运行 python 时所需的编解码器.
Alternatively, set the PYTHONIOENCODING
environment variable to the desired codec when running python.
从 Python 3.7 开始,您还可以重新配置现有的std*
包装器,前提是您在开始时(在读取任何数据之前)执行此操作:
From Python 3.7 onwards, you can also reconfigure the existing std*
wrappers, provided you do it at the start (before any data has been read):
# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')
这篇关于Python 3:如何指定标准输入编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!