Python 2.7:如何将字符串中的Unicode转义转换为实际的utf-8字符 [英] Python 2.7: How to convert unicode escapes in a string into actual utf-8 characters
问题描述
我使用python 2.7,并且从服务器(不是unicode!)接收到 string . 在该字符串中,我发现了带有Unicode转义序列的文本.例如这样的
I use python 2.7 and I'm receiving a string from a server (not in unicode!). Inside that string I find text with unicode escape sequences. For example like this:
<a href = "http://www.mypage.com/\u0441andmoretext">\u00b2<\a>
如何将那些\uxxxx
-转换回utf-8?我发现的答案是处理&#
还是必需的eval()
,这对我来说太慢了.对于包含此类后缀的任何文本,我都需要一个通用的解决方案.
How do I convert those \uxxxx
- back to utf-8? The answers I found were either dealing with &#
or required eval()
which is too slow for my purposes. I need a universal solution for any text containing such sequenes.
<\a>
是一个错字,但我也想容忍这种错字.应该只对\u
<\a>
is a typo but I want a tolerance against such typos as well. There should only be reaction to \u
示例文本是用适当的python语法表示的,如下所示:
The example text is meant in proper python syntax like this:
"<a href = \"http://www.mypage.com/\\u0441andmoretext\">\\u00b2<\\a>"
所需的输出使用正确的python语法
The desired output is in proper python syntax
"<a href = \"http://www.mypage.com/\xd1\x81andmoretext\">\xc2\xb2<\\a>"
推荐答案
尝试
>>> s = "<a href = \"http://www.mypage.com/\\u0441andmoretext\">\\u00b2<\\a>"
>>> s.decode("raw_unicode_escape")
u'<a href = "http://www.mypage.com/\u0441andmoretext">\xb2<\\a>'
然后您可以照常编码为utf8.
And then you can encode to utf8 as usual.
这篇关于Python 2.7:如何将字符串中的Unicode转义转换为实际的utf-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!