在Python3中评估字符串中的UTF-8文字转义序列 [英] Evaluate UTF-8 literal escape sequences in a string in Python3

查看:147
本文介绍了在Python3中评估字符串中的UTF-8文字转义序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,格式为:

I have a string of the form:

s = '\\xe2\\x99\\xac'

我想通过评估转义序列将其转换为字符♬.但是,我尝试过的所有操作都会导致错误或打印出垃圾.如何强制Python将转义序列转换为文字unicode字符?

I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert the escape sequence into a literal unicode character?

我在其他地方阅读的内容表明,下面的代码行应该可以实现我想要的功能,但是会导致UnicodeEncodeError.

What I've read elsewhere suggests that the following line of code should do what I want, but it results in a UnicodeEncodeError.

print(bytes(s, 'utf-8').decode('unicode-escape'))

我还尝试了以下方法,其结果相同:

I also tried the following, which has the same result:

import codecs
print(codecs.getdecoder('unicode_escape')(s)[0])

这两种方法都产生字符串'âx99',随后打印无法处理.

Both of these approaches produce the string 'â\x99¬', which print is subsequently unable to handle.

万一这有什么区别,那就是从UTF-8编码的文件中读取字符串,并在处理后最终将其输出到另一个UTF-8编码的文件中.

In case it makes any difference the string is being read in from a UTF-8 encoded file and will ultimately be output to a different UTF-8 encoded file after processing.

推荐答案

...decode('unicode-escape')将为您提供字符串'\xe2\x99\xac'.

>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape')
'â\x99¬'
>>> _ == '\xe2\x99\xac'
True

您需要对其进行解码.但是要进行解码,请先使用latin1(或iso-8859-1)对其进行编码以保留字节.

You need to decode it. But to decode it, encode it first with latin1 (or iso-8859-1) to preserve the bytes.

>>> s = '\\xe2\\x99\\xac'
>>> s.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
'♬'

这篇关于在Python3中评估字符串中的UTF-8文字转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆