Python的os.path希伯来语文件名令人窒息 [英] Python's os.path choking on Hebrew filenames

查看:115
本文介绍了Python的os.path希伯来语文件名令人窒息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个脚本,该脚本必须移动一些文件,但是不幸的是,os.path似乎并不能很好地与国际化打交道.当我使用希伯来语命名的文件时,出现了问题.这是目录内容的屏幕截图:


(来源: thegreenplace.net )

现在考虑遍历此目录中文件的这段代码:

files = os.listdir('test_source')

for f in files:
    pf = os.path.join('test_source', f)
    print pf, os.path.exists(pf)

输出为:

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False

注意os.path.exists如何认为以希伯来语命名的文件甚至不存在? 我该如何解决?

Windows XP Home SP2上的ActivePython 2.5.2

解决方案

嗯,在一些挖掘之后似乎在为os.listdir提供一个unicode字符串时,这种方法有效:

files = os.listdir(u'test_source')

for f in files:

    pf = os.path.join(u'test_source', f)
    print pf.encode('ascii', 'replace'), os.path.exists(pf)

===>

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True

一些重要的观察结果:

  • Windows XP(与所有NT衍生产品一样)将所有文件名存储在Unicode中
  • os.listdir(以及类似功能,例如os.walk)应传递unicode字符串,以便正确使用unicode路径.以下是上述链接的引文:

os.listdir(),它返回文件名, 提出了一个问题:它应该返回 文件名的Unicode版本,或者 它应该返回8位字符串吗 包含编码版本? os.listdir()将同时执行这两个操作,具体取决于 是否提供目录 路径为8位字符串或Unicode 细绳.如果传递Unicode字符串 作为路径,文件名将被解码 使用文件系统的编码和 Unicode字符串列表将是 返回,同时通过8位路径 将返回8位版本的 文件名.

  • 最后,print需要一个ascii字符串,而不是unicode,因此必须将路径编码为ascii.

I'm writing a script that has to move some file around, but unfortunately it doesn't seem os.path plays with internationalization very well. When I have files named in Hebrew, there are problems. Here's a screenshot of the contents of a directory:


(source: thegreenplace.net)

Now consider this code that goes over the files in this directory:

files = os.listdir('test_source')

for f in files:
    pf = os.path.join('test_source', f)
    print pf, os.path.exists(pf)

The output is:

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False

Notice how os.path.exists thinks that the hebrew-named file doesn't even exist? How can I fix this?

ActivePython 2.5.2 on Windows XP Home SP2

解决方案

Hmm, after some digging it appears that when supplying os.listdir a unicode string, this kinda works:

files = os.listdir(u'test_source')

for f in files:

    pf = os.path.join(u'test_source', f)
    print pf.encode('ascii', 'replace'), os.path.exists(pf)

===>

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True

Some important observations here:

  • Windows XP (like all NT derivatives) stores all filenames in unicode
  • os.listdir (and similar functions, like os.walk) should be passed a unicode string in order to work correctly with unicode paths. Here's a quote from the aforementioned link:

os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem's encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames.

  • And lastly, print wants an ascii string, not unicode, so the path has to be encoded to ascii.

这篇关于Python的os.path希伯来语文件名令人窒息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆