使用 Python & 在 Windows 上的 Unicode 文件名子进程.Popen() [英] Unicode filenames on Windows with Python & subprocess.Popen()
问题描述
为什么会出现以下情况:
<预><代码>>>>u'\u0308'.encode('mbcs') #UMLAUT'\xa8'>>>u'\u041A'.encode('mbcs') #西里尔大写字母 KA'?>>>我有一个 Python 应用程序接受来自操作系统的文件名.它适用于某些国际用户,但不适用于其他用户.
例如,这个 unicode 文件名:你'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
不会使用 Windows 'mbcs' 编码(文件系统使用的编码,由 sys.getfilesystemencoding() 返回)进行编码.我得到'???????',表示编码器在这些字符上失败.但这毫无意义,因为文件名最初来自用户.
更新:这是我背后的原因的背景......我的系统上有一个名称为西里尔文的文件.我想用该文件作为参数调用 subprocess.Popen() .Popen 不会处理 unicode.通常,我可以使用 sys.getfilesystemencoding() 给出的编解码器对参数进行编码.在这种情况下它不起作用
在 Py3K - 至少从 Python 3.2 - subprocess.Popen
和 sys.argv
与Windows 上的(默认 unicode)字符串.CreateProcessW
和 GetCommandLineW
明显使用.
在 Python 中 - 至少到 v2.7.2 - subprocess.Popen
带有 Unicode 参数的错误.它坚持 CreateProcessA
(而 os.*
与 Unicode 一致).而 shlex.split
造成了额外的废话.
Pywin32 的 win32process.CreateProcess
也不会自动切换到 W 版本,也没有 win32process.CreateProcessW
.与 GetCommandLine
相同.因此需要使用 ctypes.windll.kernel32.CreateProcessW...
.关于这个问题,可能应该修复子流程模块.
argv[1:]
上的 UTF8 在 Unicode 操作系统上仍然很笨拙.对于像 Linux 这样的 8 位Latin1"字符串操作系统,这些技巧可能是合法的.
更新 vaab 为 Python 2.7 创建了 Popen
的补丁版本,解决了这个问题.
请参阅 https://gist.github.com/vaab/2ad7051fc193167f15f85ef5973e>5带有解释的博客文章:http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/>
Why does the following occur:
>>> u'\u0308'.encode('mbcs') #UMLAUT
'\xa8'
>>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA
'?'
>>>
I have a Python application accepting filenames from the operating system. It works for some international users, but not others.
For example, this unicode filename: u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
will not encode with Windows 'mbcs' encoding (the one used by the filesystem, returned by sys.getfilesystemencoding()). I get '???????', indicating the encoder fails on those characters. But this makes no sense, since the filename came from the user to begin with.
Update: Here's the background to my reasons behind this... I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work
In Py3K - at least from Python 3.2 - subprocess.Popen
and sys.argv
work consistently with (default unicode) strings on Windows. CreateProcessW
and GetCommandLineW
are used obviously.
In Python - up to v2.7.2 at least - subprocess.Popen
is buggy with Unicode arguments. It sticks to CreateProcessA
(while os.*
are consistent with Unicode). And shlex.split
creates additional nonsense.
Pywin32's win32process.CreateProcess
also doesn't auto-switch to the W version, nor is there a win32process.CreateProcessW
. Same with GetCommandLine
.
Thus ctypes.windll.kernel32.CreateProcessW...
needs to be used.
The subprocess module perhaps should be fixed regarding this issue.
UTF8 on argv[1:]
with private apps remains clumsy on a Unicode OS. Such tricks may be legal for 8-bit "Latin1" string OSes like Linux.
UPDATE vaab has created a patched version of Popen
for Python 2.7 which fixes the issue.
See https://gist.github.com/vaab/2ad7051fc193167f15f85ef573e54eb9
Blog post with explanations: http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/
这篇关于使用 Python & 在 Windows 上的 Unicode 文件名子进程.Popen()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!