unicode 字符显示不正确 [英] unicode characters not showing properly

查看:114
本文介绍了unicode 字符显示不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我抓取了一组网站并提取了不同的字符串,其中包含 unicode 编码的字符,例如D\xfcsseldorf 的最佳餐厅".我将它们存储在 PostgreSQL 数据库中.当我从数据库中检索前面提到的字符串并执行以下操作时:

name = string_retrieved_from_database印刷名称

输出为 unicode u'D\xfcsseldorf 的最佳餐厅'.我想按原样显示字符串:杜塞尔多夫最佳餐厅".我该怎么做.

解决方案

你确定你在打印变量时得到输出,而不是仅仅以交互方式显示它?使用 print 时,你永远不应该得到 u'...' 显示:

<预><代码>>>>x = bD\xfcsseldorf 最佳用餐地点">>>x.decode('latin-1')u'D\xfcsseldorf 的最佳餐饮场所'>>>打印 x.decode('latin-1')杜塞尔多夫最佳餐饮场所

如果您在实际字符串中得到反斜杠等,则可能在编码阶段出现问题(例如,文字反斜杠被写入文本).在这种情况下,您可能需要查看unicode-escape"编解码器:

<预><代码>>>>x = bD\\xfcsseldorf 最佳餐厅">>>打印 xD\xfcsseldorf 最佳餐饮场所>>>打印 x.decode('unicode-escape')杜塞尔多夫最佳餐饮场所

I crawled a set of sites and extracted different strings with unicode encoded characters such as 'Best places to eat in D\xfcsseldorf'. I have them stored as showed in a PostgreSQL database. When I retrieve strings that the mentioned earlier from Database and do:

name = string_retrieved_from_database
print name

outputs as unicode u'Best places to eat in D\xfcsseldorf'. I want to display the string as it should be: 'Best places to eat in Düsseldorf'. How can I do that.

解决方案

Are you sure you get output when you print the variable, instead of just displaying it interactively? You should never get the u'...' display when using print:

>>> x = b"Best places to eat in D\xfcsseldorf"
>>> x.decode('latin-1')
u'Best places to eat in D\xfcsseldorf'
>>> print x.decode('latin-1')
Best places to eat in Düsseldorf

If you're getting the backslash and so forth in the actual string, then it's possible something went wrong at the encoding stage (e.g., literal backslashes were written into the text). In that case you may want to look at the "unicode-escape" codec:

>>> x = b"Best places to eat in D\\xfcsseldorf"
>>> print x
Best places to eat in D\xfcsseldorf
>>> print x.decode('unicode-escape')
Best places to eat in Düsseldorf

这篇关于unicode 字符显示不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆