打印一个utf-8编码的字符串 [英] Printing a utf-8 encoded string

查看:102
本文介绍了打印一个utf-8编码的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用BeautifulSoup从HTML中提取一些文本,但是我无法弄清楚如何正确地将其打印到屏幕上(或者是一个文件)。

I'm using BeautifulSoup to extract some text from an HTML but I just can't figure out how to print it properly to the screen (or to a file for that matter).

我的类包含文本的方式如下:

Here's how my class containing the text looks like:

class Thread(object):
    def __init__(self, title, author, date, content = u""):
        self.title = title
        self.author = author
        self.date = date
        self.content = content
        self.replies = []

    def __unicode__(self):
        s = u""

        for k, v in self.__dict__.items():
            s += u"%s = %s " % (k, v)

        return s

    def __repr__(self):
        return repr(unicode(self))

    __str__ = __repr__

当尝试打印线程的实例时,这是我在控制台上看到的:

When trying to print an instance of Thread here's what I see on the console:

~/python-tests $ python test.py
u'date = 21:01 03/02/11 content =  author = \u05d3"\u05e8 \u05d9\u05d5\u05e0\u05d9 \u05e1\u05d8\u05d0\u05e0\u05e6\'\u05e1\u05e7\u05d5 replies = [] title = \u05de\u05d1\u05e0\u05d4 \u05d4\u05de\u05d1\u05d7\u05df '

无论我尝试什么都不能得到我想要的输出(上面的文本应该是希伯来文)。我的最终目标是将线程序列化到一个文件(使用json或pickle),并可以读回来。

Whatever I try I cannot get the output I'd like (the above text should be Hebrew). My end goal is to serialize Thread to a file (using json or pickle) and be able to read it back.

我正在Ubuntu 10.10中使用Python 2.6.6运行。

I'm running this with Python 2.6.6 on Ubuntu 10.10.

推荐答案

要将Unicode字符串输出到文件或控制台),您需要选择文本编码。在Python中,默认文本编码是ASCII,但是要支持希伯来字符,您需要使用其他编码,如UTF-8:

To output a Unicode string to a file (or the console) you need to choose a text encoding. In Python the default text encoding is ASCII, but to support Hebrew characters you need to use a different encoding, such as UTF-8:

s = unicode(your_object).encode('utf8')
f.write(s)

这篇关于打印一个utf-8编码的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆