如何在Python中将\ xXY编码的字符转换为UTF-8? [英] How to convert \xXY encoded characters to UTF-8 in Python?

查看：118 发布时间：2020/7/13 4:38:32 python unicode utf-8 character-encoding non-ascii-characters

本文介绍了如何在Python中将\ xXY编码的字符转换为UTF-8?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含"\ xaf"，"\ xbe"等字符的文本，据我从

I have a text which contains characters such as "\xaf", "\xbe", which, as I understand it from this question, are ASCII encoded characters.

我想将它们在Python中转换为等效的UTF-8.通常的string.encode("utf-8")抛出UnicodeDecodeError.有没有更好的方法，例如，使用codecs标准库?

I want to convert them in Python to their UTF-8 equivalents. The usual string.encode("utf-8") throws UnicodeDecodeError. Is there some better way, e.g., with the codecs standard library?

示例此处有200个字符.

推荐答案

您的文件已经是UTF-8编码的文件.

Your file is already a UTF-8 encoded file.

# saved encoding-sample to /tmp/encoding-sample
import codecs
fp= codecs.open("/tmp/encoding-sample", "r", "utf8")
data= fp.read()

import unicodedata as ud

chars= sorted(set(data))
for char in chars:
    try:
        charname= ud.name(char)
    except ValueError:
        charname= "<unknown>"
    sys.stdout.write("char U%04x %s\n" % (ord(char), charname))

并手动填写未知名称:
char U000a LINE FEED
char U001e信息分隔符两个
char U001f信息分隔符一

And manually filling in the unknown names:
char U000a LINE FEED
char U001e INFORMATION SEPARATOR TWO
char U001f INFORMATION SEPARATOR ONE

这篇关于如何在Python中将\ xXY编码的字符转换为UTF-8?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Python中将\ xXY编码的字符转换为UTF-8? [英] How to convert \xXY encoded characters to UTF-8 in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python中将\ xXY编码的字符转换为UTF-8? [英] How to convert \xXY encoded characters to UTF-8 in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭