如何在Python中修复损坏的utf-8编码? [英] How to fix broken utf-8 encoding in Python?

查看：140 发布时间：2020/7/13 3:51:19 python unicode utf-8 character-encoding

本文介绍了如何在Python中修复损坏的utf-8编码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的字符串是Niá»‡m Bá» TÃ¡t (Thiá»n sÆ° Nháº¥t Háº¡nh)，我想将其解码为Niệm Bồ Tát (Thiền sư Nhất Hạnh).我在该网站上看到可以做到 http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx

My string is Niá»‡m Bá»" TÃ¡t (Thiá»n sÆ° Nháº¥t Háº¡nh) and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh). I see in that site can do that http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx

然后我开始尝试使用python

and I start to try by Python

mystr = '09. BÃ¡t NhÃ£ TÃ¢m Kinh'
mystr.decode('utf-8')

但是实际上这是不正确的，因为原始字符串是utf-8，但是字符串show不是我期望的结果.

but actually it is not correct because original string is utf-8 but the string show is not my expecting result.

注意:它是越南语字符.

Note: it is Vietnamese character.

该如何解决?那是Windows Unicode还是什么?如何在此处检测编码.

How to resolve that case? Is that Windows Unicode or something? How to detect the encoding here.

推荐答案

我不确定您可以使用这些数据做什么，但是对于您在原始帖子中的示例来说，这是可行的:

I'm not sure what you can do with these kind of data, but for your example in your original post, this works:

>>> mystr = '09. BÃ¡t NhÃ£ TÃ¢m Kinh'
>>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
>>> s
u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
>>> print(s)
09. Bát Nhã Tâm Kinh

这篇关于如何在Python中修复损坏的utf-8编码?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Python中修复损坏的utf-8编码? [英] How to fix broken utf-8 encoding in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python中修复损坏的utf-8编码? [英] How to fix broken utf-8 encoding in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭