使用Python解码未知的编码繁体中文字符串 [英] Decoding unknown encoded Traditional Chinese character strings using Python

查看：150 发布时间：2020/7/11 0:13:12 python text-manipulation

本文介绍了使用Python解码未知的编码繁体中文字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，我有一个繁体中文网站，当我查看网站统计信息时，它告诉我该网站的搜索词是å%8f°å%8d%97 è¦ªå%90é¤%90å»³，这对我来说显然是没有意义的.我的问题是这种编码称为什么?并且有一种使用Python解码此字符串的方法.谢谢.

Hi I have a website that is in Traditional Chinese and when I check the site statistics it tell me that the search term for the website is å%8f°å%8d%97 è¦ªå%90é¤%90å»³ which obviously makes no sense to me. My question is what is this encoding called? And is there a way to use Python to decode this character string. Thank you.

推荐答案

它称为mutt编码；基础字节已超出其原始含义的位置，因此不再是真正的编码.

It is called a mutt encoding; the underlying bytes have been mangled beyond their original meaning and they are no longer a real encoding.

它曾经被URL引用为UTF-8，但现在被解释为latin-1，而没有取消对那些URL转义的引用.我可以通过这样解释来取消此错误:

It was once URL-quoted UTF-8, but now interpreted as latin-1 without unquoting those URL escapes. I was able to un-mangle this by interpreting it as such:

>>> from urllib2 import unquote
>>> bytesquoted = u'å%8f°å%8d%97 è¦ªå%90é¤%90å»³'.encode('latin1')
>>> unquoted = unquote(bytesquoted)
>>> print unquoted.decode('utf8')
台南 親子餐廳

这篇关于使用Python解码未知的编码繁体中文字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python解码未知的编码繁体中文字符串 [英] Decoding unknown encoded Traditional Chinese character strings using Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python解码未知的编码繁体中文字符串 [英] Decoding unknown encoded Traditional Chinese character strings using Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭