在 Python 中管道 stdout 时设置正确的编码 [英] Setting the correct encoding when piping stdout in Python

查看:18
本文介绍了在 Python 中管道 stdout 时设置正确的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在对 Python 程序的输出进行管道传输时,Python 解释器对编码感到困惑并将其设置为 None.这意味着这样的程序:

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- coding: utf-8 -*-
print u"åäö"

正常运行时可以正常工作,但失败:

will work fine when run normally, but fail with:

UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 0: ordinal not in range(128)

UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 0: ordinal not in range(128)

在管道序列中使用时.

在管道中进行这项工作的最佳方法是什么?我可以告诉它使用外壳/文件系统/正在使用的任何编码吗?

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

到目前为止,我所看到的建议是直接修改您的 site.py,或者使用此 hack 对 defaultencoding 进行硬编码:

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

是否有更好的方法来使管道工作?

Is there a better way to make piping work?

推荐答案

您的代码在脚本中运行时有效,因为 Python 将输出编码为您的终端应用程序使用的任何编码.如果您正在使用管道,则必须自己编码.

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

经验法则是:始终在内部使用 Unicode.解码您收到的内容,并对您发送的内容进行编码.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

另一个教学示例是一个 Python 程序,用于在 ISO-8859-1 和 UTF-8 之间进行转换,使两者之间的所有内容都大写.

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

设置系统默认编码是一个坏主意,因为您使用的某些模块和库可能依赖于它是 ASCII 的事实.不要这样做.

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

这篇关于在 Python 中管道 stdout 时设置正确的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆