Python无法打开路径中包含非英文字符的文件 [英] Python not able to open file with non-english characters in path
问题描述
我有一个文件路径如下:D:/bar/クレイジー・ヒッツ!/foo.abc
我正在解析 XML 文件中的路径,并将其以 file://localhost/D:/bar/クレイジー・ヒッツ!/的形式存储在名为
然后,正在进行以下操作:path
的变量中foo.abc
path=path.strip()path=path[17:] #删除file://localhost/部分路径=urllib.url2pathname(路径)路径=urllib.unquote(路径)
错误是:
IOError: [Errno 2] No such file or directory: 'D:\bar\xe3x82xafxe3x83xacxe3x82xa4xe3x82xb8xe3x83xbcxe3x83xbbxe3x83x92xe3x83x83xe3x83x84xefxbcx81\foo.abc'
更新 1:我在 Windows 7 上使用 Python 2.7
你的错误路径是:
'xe3x82xafxe3x83xacxe3x82xa4xe3x82xb8xe3x83xbcxe3x83xbbxe3x83x92xe3x83x83xe3x83x84xefxbcx81'
我认为这是您文件名的 UTF8 编码版本.
我在 Windows7 上创建了一个同名文件夹,并在其中放置了一个名为abc.txt"的文件:
<预><代码>>>>a = 'xe3x82xafxe3x83xacxe3x82xa4xe3x82xb8xe3x83xbcxe3x83xbbxe3x83x92xe3x83x83xe3x83x84xefxbcx81'>>>os.listdir('.')['?????xb7???!']>>>os.listdir(u'.') # 传递unicode 返回unicode[u'u30afu30ecu30a4u30b8u30fcu30fbu30d2u30c3u30c4uff01']>>>>>>a.decode('utf8') # UTF8 解码你的字符串匹配 listdir 输出u'u30afu30ecu30a4u30b8u30fcu30fbu30d2u30c3u30c4uff01'>>>os.listdir(a.decode('utf8'))[u'abc.txt']所以看起来邓肯对 path.decode('utf8')
的建议可以解决问题.
更新
我无法为您测试,但我建议您在执行 .decode('utf8')
之前尝试检查路径是否包含非 ascii.这有点hacky...
ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130路径=path.strip()path=path[17:] #删除file://localhost/部分路径=urllib.unquote(路径)if path.translate(ASCII_TRANS) != path: # 包含非asciipath = path.decode('utf8')路径=urllib.url2pathname(路径)
I have a file with the following path : D:/bar/クレイジー・ヒッツ!/foo.abc
I am parsing the path from a XML file and storing it in a variable called path
in the form of file://localhost/D:/bar/クレイジー・ヒッツ!/foo.abc
Then, the following operations are being done :
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.url2pathname(path)
path=urllib.unquote(path)
The error is :
IOError: [Errno 2] No such file or directory: 'D:\bar\xe3x82xafxe3x83xacxe3x82xa4xe3x82xb8xe3x83xbcxe3x83xbbxe3x83x92xe3x83x83xe3x83x84xefxbcx81\foo.abc'
Update 1 : I am using Python 2.7 on Windows 7
The path in your error is:
'xe3x82xafxe3x83xacxe3x82xa4xe3x82xb8xe3x83xbcxe3x83xbbxe3x83x92xe3x83x83xe3x83x84xefxbcx81'
I think this is the UTF8 encoded version of your filename.
I've created a folder of the same name on Windows7 and placed a file called 'abc.txt' in it:
>>> a = 'xe3x82xafxe3x83xacxe3x82xa4xe3x82xb8xe3x83xbcxe3x83xbbxe3x83x92xe3x83x83xe3x83x84xefxbcx81'
>>> os.listdir('.')
['?????xb7???!']
>>> os.listdir(u'.') # Pass unicode to have unicode returned to you
[u'u30afu30ecu30a4u30b8u30fcu30fbu30d2u30c3u30c4uff01']
>>>
>>> a.decode('utf8') # UTF8 decoding your string matches the listdir output
u'u30afu30ecu30a4u30b8u30fcu30fbu30d2u30c3u30c4uff01'
>>> os.listdir(a.decode('utf8'))
[u'abc.txt']
So it seems that Duncan's suggestion of path.decode('utf8')
does the trick.
Update
I can't test this for you, but I suggest that you try checking whether the path contains non-ascii before doing the .decode('utf8')
. This is a bit hacky...
ASCII_TRANS = '_'*32 + ''.join([chr(x) for x in range(32,126)]) + '_'*130
path=path.strip()
path=path[17:] #to remove the file://localhost/ part
path=urllib.unquote(path)
if path.translate(ASCII_TRANS) != path: # Contains non-ascii
path = path.decode('utf8')
path=urllib.url2pathname(path)
这篇关于Python无法打开路径中包含非英文字符的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!