如何在Python中修复损坏的utf-8编码? [英] How to fix broken utf-8 encoding in Python?
问题描述
我的字符串是Niệm Bá» Tát (Thiá»n sÆ° Nhất Hạnh)
,我想将其解码为Niệm Bồ Tát (Thiền sư Nhất Hạnh)
.我在该网站上看到可以做到 http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx
My string is Niệm Bá»" Tát (Thiá»n sÆ° Nhất Hạnh)
and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh)
. I see in that site can do that http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx
然后我开始尝试使用python
and I start to try by Python
mystr = '09. Bát Nhã Tâm Kinh'
mystr.decode('utf-8')
但是实际上这是不正确的,因为原始字符串是utf-8,但是字符串show不是我期望的结果.
but actually it is not correct because original string is utf-8 but the string show is not my expecting result.
注意:它是越南语字符.
Note: it is Vietnamese character.
该如何解决?那是Windows Unicode还是什么?如何在此处检测编码.
How to resolve that case? Is that Windows Unicode or something? How to detect the encoding here.
推荐答案
我不确定您可以使用这些数据做什么,但是对于您在原始帖子中的示例来说,这是可行的:
I'm not sure what you can do with these kind of data, but for your example in your original post, this works:
>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
>>> s
u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
>>> print(s)
09. Bát Nhã Tâm Kinh
这篇关于如何在Python中修复损坏的utf-8编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!