Python - 如何将 unicode 文件名转换为 CP437? [英] Python - how to convert unicode filename to CP437?

查看:55
本文介绍了Python - 如何将 unicode 文件名转换为 CP437?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有 Unicode 名称的文件,例如 'קובץ.txt'.我想打包他,我正在使用 python 的 zipfile.

我可以压缩文件并稍后打开它们,但在使用 Windows 7 文件资源管理器查看文件时文件名混乱(7zip 效果很好).

根据文档,这是一个常见问题,并且有关于如何处理的说明:

<块引用>

来自 ZipFile.write

注意

ZIP 文件没有官方的文件名编码.如果你有unicode 文件名,您必须将它们转换为字节串在将它们传递给 write() 之前所需的编码.WinZip 解释所有以 CP437 编码的文件名,也称为 DOS 拉丁文.

抱歉,我似乎不知道我应该如何处理文件名.我试过 .encode('CP437'), .decode('CP437')..

解决方案

您必须将 Unicode 字符串编码为 CP437.但是,您无法对特定示例进行编码,因为 CP437 编解码器不支持希伯来语:

<预><代码>>>>u'קובץ.txt'.encode('cp437')回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp437.py",第12行,编码返回 codecs.charmap_encode(input,errors,encoding_map)UnicodeEncodeError: 'charmap' 编解码器无法对位置 0-3 中的字符进行编码:字符映射到 <undefined>

上面的错误告诉你前4个字符(קובץ)无法编码,因为目标字符集中没有这样的字符.CP437 只支持西文字母(AZ,以及 ç 和 é 等重音字符)、IBM 画线字符(如╚和┤)和一些希腊符号,主要用于数学方程(如 Σ 和 φ).

您要么必须生成一个不同的文件名,该文件名仅使用 CP437 编解码器支持的字符 或者接受 WinZip 永远无法正确显示希伯来语文件名的事实,而只需坚持使用 7zip 对您有用的字符集.

I have a file that has a Unicode name, say 'קובץ.txt'. I want to pack him, and I'm using python's zipfile.

I can zip the files and open them later on with a problem except that file names are messed up when using windows 7 file explorer to view the files (7zip works great).

According to the docs, this is a common problem, and there are instructions on how to deal with that:

From ZipFile.write

Note

There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin.

Sorry, but I can't seem to get what exactly am I supposed to do with the filename. I've tried .encode('CP437'), .decode('CP437')..

解决方案

You'd have to encode your Unicode string to CP437. However, you can't encode your specific example because the CP437 codec does not support Hebrew:

>>> u'קובץ.txt'.encode('cp437')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>

The above error tells you that the first 4 characters (קובץ) cannot be encoded because there are no such characters in the target characterset. CP437 only supports the western alphabet (A-Z, and accented characters like ç and é), IBM line drawing characters (such as ╚ and ┤) and a few greek symbols, mainly for math equations (such as Σ and φ).

You'll either have to generate a different filename that only uses characters supported by the CP437 codec or live with the fact that WinZip will never be able to show Hebrew filenames properly, and simply stick with the characterset that did work for you with 7zip.

这篇关于Python - 如何将 unicode 文件名转换为 CP437?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆