Python的os.path希伯来语文件名令人窒息 [英] Python's os.path choking on Hebrew filenames

查看：115 发布时间：2020/11/30 0:08:28 python internationalization hebrew

本文介绍了Python的os.path希伯来语文件名令人窒息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个脚本，该脚本必须移动一些文件，但是不幸的是，os.path似乎并不能很好地与国际化打交道.当我使用希伯来语命名的文件时，出现了问题.这是目录内容的屏幕截图:

现在考虑遍历此目录中文件的这段代码:

files = os.listdir('test_source')

for f in files:
    pf = os.path.join('test_source', f)
    print pf, os.path.exists(pf)

输出为:

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False

注意os.path.exists如何认为以希伯来语命名的文件甚至不存在? 我该如何解决?

Windows XP Home SP2上的ActivePython 2.5.2

解决方案

嗯，在一些挖掘之后似乎在为os.listdir提供一个unicode字符串时，这种方法有效:

files = os.listdir(u'test_source')

for f in files:

    pf = os.path.join(u'test_source', f)
    print pf.encode('ascii', 'replace'), os.path.exists(pf)

===>

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True

一些重要的观察结果:

Windows XP(与所有NT衍生产品一样)将所有文件名存储在Unicode中
os.listdir(以及类似功能，例如os.walk)应传递unicode字符串，以便正确使用unicode路径.以下是上述链接的引文:

os.listdir()，它返回文件名，提出了一个问题:它应该返回文件名的Unicode版本，或者它应该返回8位字符串吗包含编码版本? os.listdir()将同时执行这两个操作，具体取决于是否提供目录路径为8位字符串或Unicode 细绳.如果传递Unicode字符串作为路径，文件名将被解码使用文件系统的编码和 Unicode字符串列表将是返回，同时通过8位路径将返回8位版本的文件名.

最后，print需要一个ascii字符串，而不是unicode，因此必须将路径编码为ascii.

I'm writing a script that has to move some file around, but unfortunately it doesn't seem os.path plays with internationalization very well. When I have files named in Hebrew, there are problems. Here's a screenshot of the contents of a directory:

_{(source: thegreenplace.net)}

Now consider this code that goes over the files in this directory:

files = os.listdir('test_source')

for f in files:
    pf = os.path.join('test_source', f)
    print pf, os.path.exists(pf)

The output is:

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False

Notice how os.path.exists thinks that the hebrew-named file doesn't even exist? How can I fix this?

ActivePython 2.5.2 on Windows XP Home SP2

解决方案

Hmm, after some digging it appears that when supplying os.listdir a unicode string, this kinda works:

files = os.listdir(u'test_source')

for f in files:

    pf = os.path.join(u'test_source', f)
    print pf.encode('ascii', 'replace'), os.path.exists(pf)

===>

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True

Some important observations here:

Windows XP (like all NT derivatives) stores all filenames in unicode
os.listdir (and similar functions, like os.walk) should be passed a unicode string in order to work correctly with unicode paths. Here's a quote from the aforementioned link:

os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem's encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames.

And lastly, print wants an ascii string, not unicode, so the path has to be encoded to ascii.

这篇关于Python的os.path希伯来语文件名令人窒息的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python的os.path希伯来语文件名令人窒息 [英] Python's os.path choking on Hebrew filenames

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python的os.path希伯来语文件名令人窒息 [英] Python&#39;s os.path choking on Hebrew filenames

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python的os.path希伯来语文件名令人窒息 [英] Python's os.path choking on Hebrew filenames

登录关闭