在Python中管道stdout时设置正确的编码 [英] Setting the correct encoding when piping stdout in Python

查看:464
本文介绍了在Python中管道stdout时设置正确的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当管道输出Python程序时,Python解释器对编码感到困惑,并将其设置为None。这意味着这样的程序:

 # -  *  - 编码:utf-8  -  *  -  
打印u åäö

在正常运行时可以正常运行,但失败:


UnicodeEncodeError:'ascii'编解码器不能在位置0中编码字符u'\xa0':ordinal不在范围内(128)



当管道中使用时,最好的方式是什么?我可以告诉它使用任何编码的shell /文件系统/任何使用的?



目前为止我已经看到的建议是直接修改你的site.py,或者使用这个hack来硬编码defaultencoding:

 # -  *  - 编码:utf-8  -  *  -  
import sys
reload(sys)
sys.setdefaultencoding('utf- 8')
打印uåäö

有没有更好的方法来使管道工作?

解决方案

您的代码在脚本中运行时工作,因为Python将输出编码为终端应用程序使用的任何编码。如果您是管道,您必须自己编码。



一个经验法则是:始终在内部使用Unicode。解码您收到的内容,并编码您发送的内容。

 # -  *  - 编码:utf-8  -  *  -  
打印uåäö.encode('utf-8')

另一个教学示例是Python程序在ISO-8859-1和UTF-8之间进行转换,使所有大写在两者之间。

  import sys 
对于sys.stdin中的行:
#解码您收到的内容:
line = line.decode('iso8859-1')

#在内部使用Unicode:
line = line.upper()

#编码你发送的内容:
line = line.encode('utf-8')
sys.stdout.write(行)

设置系统默认编码是一个坏主意,因为您使用的某些模块和库可以依赖于其实是ASCII。不要这样做。


When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- coding: utf-8 -*-
print u"åäö"

will work fine when run normally, but fail with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

when used in a pipe sequence.

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

Is there a better way to make piping work?

解决方案

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sys
for line in sys.stdin:
    # Decode what you receive:
    line = line.decode('iso8859-1')

    # Work with Unicode internally:
    line = line.upper()

    # Encode what you send:
    line = line.encode('utf-8')
    sys.stdout.write(line)

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

这篇关于在Python中管道stdout时设置正确的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆