unicode 字符显示不正确 [英] unicode characters not showing properly
问题描述
我抓取了一组网站并提取了不同的字符串,其中包含 unicode 编码的字符,例如D\xfcsseldorf 的最佳餐厅".我将它们存储在 PostgreSQL 数据库中.当我从数据库中检索前面提到的字符串并执行以下操作时:
name = string_retrieved_from_database印刷名称
输出为 unicode u'D\xfcsseldorf 的最佳餐厅'.我想按原样显示字符串:杜塞尔多夫最佳餐厅".我该怎么做.
你确定你在打印变量时得到输出,而不是仅仅以交互方式显示它?使用 print
时,你永远不应该得到 u'...'
显示:
如果您在实际字符串中得到反斜杠等,则可能在编码阶段出现问题(例如,文字反斜杠被写入文本).在这种情况下,您可能需要查看unicode-escape"编解码器:
<预><代码>>>>x = bD\\xfcsseldorf 最佳餐厅">>>打印 xD\xfcsseldorf 最佳餐饮场所>>>打印 x.decode('unicode-escape')杜塞尔多夫最佳餐饮场所I crawled a set of sites and extracted different strings with unicode encoded characters such as 'Best places to eat in D\xfcsseldorf'. I have them stored as showed in a PostgreSQL database. When I retrieve strings that the mentioned earlier from Database and do:
name = string_retrieved_from_database
print name
outputs as unicode u'Best places to eat in D\xfcsseldorf'. I want to display the string as it should be: 'Best places to eat in Düsseldorf'. How can I do that.
Are you sure you get output when you print the variable, instead of just displaying it interactively? You should never get the u'...'
display when using print
:
>>> x = b"Best places to eat in D\xfcsseldorf"
>>> x.decode('latin-1')
u'Best places to eat in D\xfcsseldorf'
>>> print x.decode('latin-1')
Best places to eat in Düsseldorf
If you're getting the backslash and so forth in the actual string, then it's possible something went wrong at the encoding stage (e.g., literal backslashes were written into the text). In that case you may want to look at the "unicode-escape" codec:
>>> x = b"Best places to eat in D\\xfcsseldorf"
>>> print x
Best places to eat in D\xfcsseldorf
>>> print x.decode('unicode-escape')
Best places to eat in Düsseldorf
这篇关于unicode 字符显示不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!