Python的os.path希伯来语文件名令人窒息 [英] Python's os.path choking on Hebrew filenames
问题描述
我正在编写一个脚本,该脚本必须移动一些文件,但是不幸的是,os.path
似乎并不能很好地与国际化打交道.当我使用希伯来语命名的文件时,出现了问题.这是目录内容的屏幕截图:
(来源: thegreenplace.net ) >
现在考虑遍历此目录中文件的这段代码:
files = os.listdir('test_source')
for f in files:
pf = os.path.join('test_source', f)
print pf, os.path.exists(pf)
输出为:
test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False
注意os.path.exists
如何认为以希伯来语命名的文件甚至不存在?
我该如何解决?
Windows XP Home SP2上的ActivePython 2.5.2
嗯,在一些挖掘之后似乎在为os.listdir提供一个unicode字符串时,这种方法有效:
files = os.listdir(u'test_source')
for f in files:
pf = os.path.join(u'test_source', f)
print pf.encode('ascii', 'replace'), os.path.exists(pf)
===>
test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True
一些重要的观察结果:
- Windows XP(与所有NT衍生产品一样)将所有文件名存储在Unicode中
-
os.listdir
(以及类似功能,例如os.walk
)应传递unicode字符串,以便正确使用unicode路径.以下是上述链接的引文:
os.listdir(),它返回文件名, 提出了一个问题:它应该返回 文件名的Unicode版本,或者 它应该返回8位字符串吗 包含编码版本? os.listdir()将同时执行这两个操作,具体取决于 是否提供目录 路径为8位字符串或Unicode 细绳.如果传递Unicode字符串 作为路径,文件名将被解码 使用文件系统的编码和 Unicode字符串列表将是 返回,同时通过8位路径 将返回8位版本的 文件名.
- 最后,
print
需要一个ascii字符串,而不是unicode,因此必须将路径编码为ascii.
I'm writing a script that has to move some file around, but unfortunately it doesn't seem os.path
plays with internationalization very well. When I have files named in Hebrew, there are problems. Here's a screenshot of the contents of a directory:
(source: thegreenplace.net)
Now consider this code that goes over the files in this directory:
files = os.listdir('test_source')
for f in files:
pf = os.path.join('test_source', f)
print pf, os.path.exists(pf)
The output is:
test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False
Notice how os.path.exists
thinks that the hebrew-named file doesn't even exist?
How can I fix this?
ActivePython 2.5.2 on Windows XP Home SP2
Hmm, after some digging it appears that when supplying os.listdir a unicode string, this kinda works:
files = os.listdir(u'test_source')
for f in files:
pf = os.path.join(u'test_source', f)
print pf.encode('ascii', 'replace'), os.path.exists(pf)
===>
test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True
Some important observations here:
- Windows XP (like all NT derivatives) stores all filenames in unicode
os.listdir
(and similar functions, likeos.walk
) should be passed a unicode string in order to work correctly with unicode paths. Here's a quote from the aforementioned link:
os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem's encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames.
- And lastly,
print
wants an ascii string, not unicode, so the path has to be encoded to ascii.
这篇关于Python的os.path希伯来语文件名令人窒息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!