在Windows上从Python 2.x中的命令行参数中读取Unicode字符 [英] Read Unicode characters from command-line arguments in Python 2.x on Windows

查看:1274
本文介绍了在Windows上从Python 2.x中的命令行参数中读取Unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的Python脚本能够在Windows中读取Unicode命令行参数。但似乎sys.argv是一个字符串编码在一些本地编码,而不是Unicode。如何以完整的Unicode阅读命令行?

I want my Python script to be able to read Unicode command line arguments in Windows. But it appears that sys.argv is a string encoded in some local encoding, rather than Unicode. How can I read the command line in full Unicode?

示例代码: argv.py

import sys

first_arg = sys.argv[1]
print first_arg
print type(first_arg)
print first_arg.encode("hex")
print open(first_arg)

在我的电脑上设置日语代码页,我得到:

On my PC set up for Japanese code page, I get:

C:\temp>argv.py "PC・ソフト申請書08.09.24.doc"
PC・ソフト申請書08.09.24.doc
<type 'str'>
50438145835c83748367905c90bf8f9130382e30392e32342e646f63
<open file 'PC・ソフト申請書08.09.24.doc', mode 'r' at 0x00917D90>

这是Shift-JIS编码的,我相信,它工作的文件名。但是对于不是Shift-JIS字符集中的字符的文件名,最后的open调用失败:

That's Shift-JIS encoded I believe, and it "works" for that filename. But it breaks for filenames with characters that aren't in the Shift-JIS character set—the final "open" call fails:

C:\temp>argv.py Jörgen.txt
Jorgen.txt
<type 'str'>
4a6f7267656e2e747874
Traceback (most recent call last):
  File "C:\temp\argv.py", line 7,
in <module>
    print open(first_arg)
IOError: [Errno 2] No such file or directory: 'Jorgen.txt'

注意 - 我在谈论Python 2.x,而不是Python 3.0。我发现Python 3.0给予 sys.argv 作为正确的Unicode。但是,由于缺乏第三方库支持,因此要过渡到Python 3.0还有点早。

Note—I'm talking about Python 2.x, not Python 3.0. I've found that Python 3.0 gives sys.argv as proper Unicode. But it's a bit early yet to transition to Python 3.0 (due to lack of 3rd party library support).

更新:

有几个答案说我应该根据 sys.argv 编码的解码。问题是,它的

A few answers have said I should decode according to whatever the sys.argv is encoded in. The problem with that is that it's not full Unicode, so some characters are not representable.

这里是让我感到悲痛的使用案例:我有在Windows资源管理器中启用将文件拖放到.py文件。我有文件名称与各种字符,包括一些不在系统默认代码页。在所有情况下,当字符在当前代码页编码中无法表示时,我的Python脚本不能通过sys.argv传递给它的正确的Unicode文件名。

Here's the use case that gives me grief: I have enabled drag-and-drop of files onto .py files in Windows Explorer. I have file names with all sorts of characters, including some not in the system default code page. My Python script doesn't get the right Unicode filenames passed to it via sys.argv in all cases, when the characters aren't representable in the current code page encoding.

当然有一些Windows API用完整的Unicode(和Python 3.0)来读命令行。我假设Python 2.x解释器没有使用它。

There is certainly some Windows API to read the command line with full Unicode (and Python 3.0 does it). I assume the Python 2.x interpreter is not using it.

推荐答案

这里是一个解决方案, for,调用Windows GetCommandLineArgvW 函数:

在Windows下获取带有Unicode字符的sys.argv (来自ActiveState)

Here is a solution that is just what I'm looking for, making a call to the Windows GetCommandLineArgvW function:
Get sys.argv with Unicode characters under Windows (from ActiveState)

但我做了几个修改,其使用和更好地处理某些用途。这是我使用的:

But I've made several changes, to simplify its usage and better handle certain uses. Here is what I use:

win32_unicode_argv.py
$ b

win32_unicode_argv.py

"""
win32_unicode_argv.py

Importing this will replace sys.argv with a full Unicode form.
Windows only.

From this site, with adaptations:
      http://code.activestate.com/recipes/572200/

Usage: simply import this module into a script. sys.argv is changed to
be a list of Unicode strings.
"""


import sys

def win32_unicode_argv():
    """Uses shell32.GetCommandLineArgvW to get sys.argv as a list of Unicode
    strings.

    Versions 2.x of Python don't support Unicode in sys.argv on
    Windows, with the underlying Windows API instead replacing multi-byte
    characters with '?'.
    """

    from ctypes import POINTER, byref, cdll, c_int, windll
    from ctypes.wintypes import LPCWSTR, LPWSTR

    GetCommandLineW = cdll.kernel32.GetCommandLineW
    GetCommandLineW.argtypes = []
    GetCommandLineW.restype = LPCWSTR

    CommandLineToArgvW = windll.shell32.CommandLineToArgvW
    CommandLineToArgvW.argtypes = [LPCWSTR, POINTER(c_int)]
    CommandLineToArgvW.restype = POINTER(LPWSTR)

    cmd = GetCommandLineW()
    argc = c_int(0)
    argv = CommandLineToArgvW(cmd, byref(argc))
    if argc.value > 0:
        # Remove Python executable and commands if present
        start = argc.value - len(sys.argv)
        return [argv[i] for i in
                xrange(start, argc.value)]

sys.argv = win32_unicode_argv()

现在,我使用它的方式只是做:

Now, the way I use it is simply to do:

import sys
import win32_unicode_argv

,然后从 sys.argv 。 Python optparse 模块似乎很乐意解析它,这是伟大的。

and from then on, sys.argv is a list of Unicode strings. The Python optparse module seems happy to parse it, which is great.

这篇关于在Windows上从Python 2.x中的命令行参数中读取Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆