如何在Python中修复损坏的utf-8编码? [英] How to fix broken utf-8 encoding in Python?

查看:140
本文介绍了如何在Python中修复损坏的utf-8编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的字符串是Niệm Bá» Tát (Thiá»n sÆ° Nhất Hạnh),我想将其解码为Niệm Bồ Tát (Thiền sư Nhất Hạnh).我在该网站上看到可以做到 http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx

My string is Niệm Bá»" Tát (Thiá»n sÆ° Nhất Hạnh) and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh). I see in that site can do that http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx

然后我开始尝试使用python

and I start to try by Python

mystr = '09. Bát Nhã Tâm Kinh'
mystr.decode('utf-8')

但是实际上这是不正确的,因为原始字符串是utf-8,但是字符串show不是我期望的结果.

but actually it is not correct because original string is utf-8 but the string show is not my expecting result.

注意:它是越南语字符.

Note: it is Vietnamese character.

该如何解决?那是Windows Unicode还是什么?如何在此处检测编码.

How to resolve that case? Is that Windows Unicode or something? How to detect the encoding here.

推荐答案

我不确定您可以使用这些数据做什么,但是对于您在原始帖子中的示例来说,这是可行的:

I'm not sure what you can do with these kind of data, but for your example in your original post, this works:

>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
>>> s
u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
>>> print(s)
09. Bát Nhã Tâm Kinh

这篇关于如何在Python中修复损坏的utf-8编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆