处理Python unicode字符串中编码错误的字符 [英] Handle wrongly encoded character in Python unicode string

查看:224
本文介绍了处理Python unicode字符串中编码错误的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理python-lastfm库返回的unicode字符串。

I am dealing with unicode strings returned by the python-lastfm library.

我假设在途中某个地方,该库获取了错误的编码并返回了unicode字符串

I assume somewhere on the way, the library gets the encoding wrong and returns a unicode string that may contain invalid characters.

例如,我期望变量a中的原始字符串是Glück。

For example, the original string i am expecting in the variable a is "Glück"


>>> a
u'Gl\xfcck'
>>> print a
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)

\xfc是转义值252,它对应于latin1编码的ü。

\xfc is the escaped value 252, which corresponds to the latin1 encoding of "ü". Somehow this gets embedded in the unicode string in a way python can't handle on its own.

如何将其转换回包含原始字符串的普通字符串或unicode字符串,以某种方式将其嵌入unicode字符串中,而python无法自行处理。 格吕克?我尝试使用解码/编码方法,但是遇到了UnicodeEncodeError或包含序列\xfc的字符串。

How do i convert this back a normal or unicode string that contains the original "Glück"? I tried playing around with the decode/encode methods, but either got a UnicodeEncodeError, or a string containing the sequence \xfc.

推荐答案

您的unicode字符串很好:

Your unicode string is fine:

>>> unicodedata.name(u"\xfc")
'LATIN SMALL LETTER U WITH DIAERESIS'

您在交互式提示符下看到的问题是解释器不知道使用哪种编码将字符串输出到您的终端,因此它退回到了 ascii编解码器-但是该编解码器只知道如何处理与ASCII字符。它在我的机器上正常工作(因为sys.stdout.encoding对我来说是 UTF-8-可能是因为诸如我的环境变量设置与您的环境变量设置有所不同)

The problem you see at the interactive prompt is that the interpreter doesn't know what encoding to use to output the string to your terminal, so it falls back to the "ascii" codec -- but that codec only knows how to deal with ASCII characters. It works fine on my machine (because sys.stdout.encoding is "UTF-8" for me -- likely because something like my environment variable settings differ from yours)

>>> print u'Gl\xfcck'
Glück

这篇关于处理Python unicode字符串中编码错误的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆