如何在 Python 级别将 unicode 转换为字符串? [英] How do I convert a unicode to a string at the Python level?

查看:71
本文介绍了如何在 Python 级别将 unicode 转换为字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果明确定义,以下 unicode 和 string 可以单独存在:

<预><代码>>>>value_str='Andr\xc3\xa9'>>>value_uni=u'Andr\xc3\xa9'

如果我只将 u'Andr\xc3\xa9' 分配给上述变量,我该如何将其转换为 'Andr\xc3\xa9'Python 2.5 还是 2.6?

我做了以下事情:

<预><代码>>>>value_uni.encode('latin-1')'安德\xc3\xa9'

解决了我的问题.有人可以向我解释到底发生了什么吗?

解决方案

您的编码似乎混乱了.看起来您真正想要的是 u'Andr\xe9',它相当于 'André'.

但是您所拥有的似乎是已被错误解码的 UTF-8 编码.您可以通过将 unicode 字符串转换为普通字符串来修复它.我不确定最好的方法是什么,但这似乎有效:

<预><代码>>>>''.join(chr(ord(c)) for c in u'Andr\xc3\xa9')'安德\xc3\xa9'

然后正确解码:

<预><代码>>>>''.join(chr(ord(c)) for c in u'Andr\xc3\xa9').decode('utf8')u'Andr\xe9'

现在格式正确.

但是,如果可能的话,与其这样做,不如首先尝试找出数据编码不正确的原因,并在那里解决该问题.

The following unicode and string can exist on their own if defined explicitly:

>>> value_str='Andr\xc3\xa9'
>>> value_uni=u'Andr\xc3\xa9'

If I only have u'Andr\xc3\xa9' assigned to a variable like above, how do I convert it to 'Andr\xc3\xa9' in Python 2.5 or 2.6?

EDIT:

I did the following:

>>> value_uni.encode('latin-1')
'Andr\xc3\xa9'

which fixes my issue. Can someone explain to me what exactly is happening?

解决方案

You seem to have gotten your encodings muddled up. It seems likely that what you really want is u'Andr\xe9' which is equivalent to 'André'.

But what you have seems to be a UTF-8 encoding that has been incorrectly decoded. You can fix it by converting the unicode string to an ordinary string. I'm not sure what the best way is, but this seems to work:

>>> ''.join(chr(ord(c)) for c in u'Andr\xc3\xa9')
'Andr\xc3\xa9'

Then decode it correctly:

>>> ''.join(chr(ord(c)) for c in u'Andr\xc3\xa9').decode('utf8')
u'Andr\xe9'    

Now it is in the correct format.

However instead of doing this, if possible you should try to work out why the data has been incorrectly encoded in the first place, and fix that problem there.

这篇关于如何在 Python 级别将 unicode 转换为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆