在Python 3中将utf-8 Unicode序列转换为utf-8字符 [英] Convert utf-8 unicode sequence to utf-8 chars in Python 3

查看：182 发布时间：2020/10/1 1:02:19 python python-3.x unicode utf-8 character-encoding

本文介绍了在Python 3中将utf-8 Unicode序列转换为utf-8字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从as s3存储桶中读取数据，而s3存储桶恰好用双反斜杠转义了Unicode字符。

I'm reading data from an aws s3 bucket which happens to have unicode chars escaped with double backslashes.

双反斜杠使Unicode序列解析为一系列utf-8个字符，而不是Unicode表示的字符。

The double backslashes makes the unicode sequence parsed as a series of utf-8 characters instead of the character which the unicode represents.

该示例说明了这种情况。

The example illustrates the situation.

>>> s1="1+1\\u003d2"
>>> print(s1)
1+1\u003d2

实际的unicode序列在此

The actual unicode sequence would in this case an equal sign.

>>> s2="1+1\u003d2"
>>> print(s2)
1+1=2

有没有办法转换序列utf-8字符在第一个示例中的位置，以便将s1表示的字符串与其unicode序列一起解析为它表示的实际utf-8符号？

Is there a way to convert the sequence of utf-8 character in the first example so that the string represented by s1 is parsed with it's unicode sequence as the actual utf-8 sign that it represents?

推荐答案

我相信编解码器模块提供了此实用程序：

I believe that the codecs module provides this utility:

>>> import codecs
>>> codecs.decode("1+1\\u003d2", encoding='unicode_escape')
'1+1=2'

这可能是一个更大的问题。这些字符串如何排在首位？

This probably points to a larger problem, though. How do these strings come to be in the first place?

注意，如果这是从有效的JSON字符串中提取的（在这种情况下，它将丢失引号），您可以简单地使用：

Note, if this is being extracted from a valid JSON string (in this case it would be missing the quotes), you could simply use:

>>> import json
>>> json.loads('"1+1\\u003d2"')
'1+1=2'

这篇关于在Python 3中将utf-8 Unicode序列转换为utf-8字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Python 3中将utf-8 Unicode序列转换为utf-8字符 [英] Convert utf-8 unicode sequence to utf-8 chars in Python 3

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python 3中将utf-8 Unicode序列转换为utf-8字符 [英] Convert utf-8 unicode sequence to utf-8 chars in Python 3

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭