UTF-8在Python日志记录中,如何? [英] UTF-8 In Python logging, how?

查看:164
本文介绍了UTF-8在Python日志记录中,如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python的日志记录包将UTF-8编码的字符串记录到文件中.作为一个玩具示例:

I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:

import logging

def logging_test():
    handler = logging.FileHandler("/home/ted/logfile.txt", "w",
                                  encoding = "UTF-8")
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string)

if __name__ == "__main__":
    logging_test()

这在logging.info()调用中因UnicodeDecodeError爆炸.

This explodes with UnicodeDecodeError on the logging.info() call.

在较低级别,Python的日志记录程序包使用编解码器程序包打开日志文件,并传递"UTF-8"参数作为编码.一切都很好,但是它试图将字节字符串而不是unicode对象写入文件,这会爆炸.本质上,Python正在这样做:

At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:

file_handler.write(unicode_string.encode("UTF-8"))

何时应该这样做:

file_handler.write(unicode_string)

这是Python中的错误,还是我正在服用疯狂药丸? FWIW,这是一个库存的Python 2.6安装.

Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.

推荐答案

检查您是否拥有最新的Python 2.6-自2.6发行以来已发现并修复了一些Unicode错误.例如,在我的Ubuntu Jaunty系统上,我运行了复制和粘贴的脚本,只从日志文件名中删除了'/home/ted/'前缀.结果(从终端窗口复制和粘贴):

Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):


vinay@eta-jaunty:~/projects/scratch$ python --version
Python 2.6.2
vinay@eta-jaunty:~/projects/scratch$ python utest.py 
printed unicode object: ô
vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt 
ô
vinay@eta-jaunty:~/projects/scratch$ 

在Windows框上:

On a Windows box:


C:\temp>python --version
Python 2.6.2

C:\temp>python utest.py
printed unicode object: ô

文件内容:

这也许也可以解释为什么Lennart Regebro也无法复制它.

This might also explain why Lennart Regebro couldn't reproduce it either.

这篇关于UTF-8在Python日志记录中,如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆