Python显示特殊字符 [英] Python to show special characters

查看:197
本文介绍了Python显示特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有关此问题的线程很多,但是我没有找到能解决我问题的线程。

I know there are tons of threads regarding this issue but I have not managed to find one which solves my problem.

我正在尝试打印字符串,但是当印刷时不显示特殊字符(例如æ,ø,å,ö和ü)。当我使用 repr()打印字符串时,这就是我得到的:

I am trying to print a string but when printed it doesn't show special characters (e.g. æ, ø, å, ö and ü). When I print the string using repr() this is what I get:

u 'Von D\xc3\xbc' u'\xc3\x96berg'

有人知道我如何将其转换为 VonDüÖberg吗?对我来说,重要的是不要忽略这些字符,例如 myStr.encode( ascii, ignore)

Does anyone know how I can convert this to Von Dü and Öberg? It's important to me that these characters are not ignored, e.g. myStr.encode("ascii", "ignore").

编辑

这是我使用的代码。我使用BeautifulSoup抓取一个网站。将表(< table> )中单元格(< td> )的内容放入变量 name 。这是包含无法打印的特殊字符的变量。

This is the code I use. I use BeautifulSoup to scrape a website. The contents of a cell (<td>) in a table (<table>), is put into the variable name. This is the variable which contains special characters that I cannot print.

web = urllib2.urlopen(url);
soup = BeautifulSoup(web)
tables = soup.find_all("table")
scene_tables = [2, 3, 6, 7, 10]
scene_index = 0
# Iterate over the <table>s we want to work with
for scene_table in scene_tables:
    i = 0
    # Iterate over < td> to find time and name
    for td in tables[scene_table].find_all("td"):
        if i % 2 == 0:  # td contains the time
            time = remove_whitespace(td.get_text())
        else:           # td contains the name
            name = remove_whitespace(td.get_text()) # This is the variable containing "nonsense"
            print "%s: %s" % (time, name,)
        i += 1
    scene_index += 1


推荐答案

预防胜于治疗。您需要找出垃圾是如何产生的。请编辑您的问题以显示创建该问题的代码,然后我们可以帮助您解决问题。看来有人这样做了:

Prevention is better than cure. What you need is to find out how that rubbish is being created. Please edit your question to show the code that creates it, and then we can help you fix it. It looks like somebody has done:

your_unicode_string =  original_utf8_encoded_bytestring.decode('latin1')

解决方法是简单地逆转该过程,然后解码。

The cure is to reverse the process, simply, and then decode.

correct_unicode_string = your_unicode_string.encode('latin1').decode('utf8')

更新根据您提供的代码,可能的原因是网站声明该网站使用 ISO-8859-1 (又名 latin1 ),但实际上它是以UTF-8编码的。请更新您的问题以向我们显示网址。

Update Based on the code that you supplied, the probable cause is that the website declares that it is encoded in ISO-8859-1 (aka latin1) but in reality it is encoded in UTF-8. Please update your question to show us the url.

如果无法显示,请阅读 BS文档;看来您需要使用:

If you can't show it, read the BS docs; it looks like you'll need to use:

BeautifulSoup(web, from_encoding='utf8')

这篇关于Python显示特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆