未烘烤的莫吉贝克 [英] Unbaking mojibake

查看：90 发布时间：2020/10/1 0:20:33 python unicode character-encoding decoding mojibake

本文介绍了未烘烤的莫吉贝克的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当您错误地解码了字符时，如何识别原始字符串的可能候选者？

When you have incorrectly decoded characters, how can you identify likely candidates for the original string?

Ä×èÈÄÄî▒è¤ô_üiâAâjâüâpâXüj_10òb.png

我知道这个图像文件名应该是一些日语字符。但是，对于urllib引用/取消引用，编码和解码iso8859-1，utf8的各种猜测，我一直无法取消和获取原始文件名。

I know for a fact that this image filename should have been some Japanese characters. But with various guesses at urllib quoting/unquoting, encode and decode iso8859-1, utf8, I haven't been able to unmunge and get the original filename.

腐败是可逆的吗？

推荐答案

您可以使用chardet（通过pip安装）：

You could use chardet (install with pip):

import chardet

your_str = "Ä×èÈÄÄî▒è¤ô_üiâAâjâüâpâXüj_10òb"
detected_encoding = chardet.detect(your_str)["encoding"]

try:
    correct_str = your_str.decode(detected_encoding)
except UnicodeDecodeError:
    print("Could not estimate encoding")

结果：时间试験観点（アニメパス）_10秒（不知道这是否正确）

Result: 時間試験観点（アニメパス）_10秒 (no idea if this could be correct or not)

对于Python 3（源文件编码为utf8）：

For Python 3 (source file encoded as utf8):

import chardet
import codecs

falsely_decoded_str = "Ä×èÈÄÄî¦è¤ô_üiâAâjâüâpâXüj_10òb"

try:
    encoded_str = falsely_decoded_str.encode("cp850")
except UnicodeEncodeError:
    print("could not encode falsely decoded string")
    encoded_str = None

if encoded_str:
    detected_encoding = chardet.detect(encoded_str)["encoding"]

    try:
        correct_str = encoded_str.decode(detected_encoding)
    except UnicodeEncodeError:
        print("could not decode encoded_str as %s" % detected_encoding)

    with codecs.open("output.txt", "w", "utf-8-sig") as out:
        out.write(correct_str)

总结：

>>> s = 'Ä×èÈÄÄî▒è¤ô_üiâAâjâüâpâXüj_10òb.png'
>>> s.encode('cp850').decode('shift-jis')
'時間試験観点（アニメパス）_10秒.png'

这篇关于未烘烤的莫吉贝克的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

未烘烤的莫吉贝克 [英] Unbaking mojibake

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

未烘烤的莫吉贝克 [英] Unbaking mojibake

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭