Python 2.7:如何将字符串中的Unicode转义转换为实际的utf-8字符 [英] Python 2.7: How to convert unicode escapes in a string into actual utf-8 characters

查看:186
本文介绍了Python 2.7:如何将字符串中的Unicode转义转换为实际的utf-8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python 2.7,并且从服务器(不是unicode!)接收到 string . 在该字符串中,我发现了带有Unicode转义序列的文本.例如这样的

I use python 2.7 and I'm receiving a string from a server (not in unicode!). Inside that string I find text with unicode escape sequences. For example like this:

<a href = "http://www.mypage.com/\u0441andmoretext">\u00b2<\a>

如何将那些\uxxxx-转换回utf-8?我发现的答案是处理&#还是必需的eval(),这对我来说太慢了.对于包含此类后缀的任何文本,我都需要一个通用的解决方案.

How do I convert those \uxxxx - back to utf-8? The answers I found were either dealing with &# or required eval() which is too slow for my purposes. I need a universal solution for any text containing such sequenes.

<\a>是一个错字,但我也想容忍这种错字.应该只对\u

<\a> is a typo but I want a tolerance against such typos as well. There should only be reaction to \u

示例文本是用适当的python语法表示的,如下所示:

The example text is meant in proper python syntax like this:

"<a href = \"http://www.mypage.com/\\u0441andmoretext\">\\u00b2<\\a>"

所需的输出使用正确的python语法

The desired output is in proper python syntax

"<a href = \"http://www.mypage.com/\xd1\x81andmoretext\">\xc2\xb2<\\a>"

推荐答案

尝试

>>> s = "<a href = \"http://www.mypage.com/\\u0441andmoretext\">\\u00b2<\\a>"
>>> s.decode("raw_unicode_escape")
u'<a href = "http://www.mypage.com/\u0441andmoretext">\xb2<\\a>'

然后您可以照常编码为utf8.

And then you can encode to utf8 as usual.

这篇关于Python 2.7:如何将字符串中的Unicode转义转换为实际的utf-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆