在 Windows 上的 Python 2.x 中从命令行参数读取 Unicode 字符 [英] Read Unicode characters from command-line arguments in Python 2.x on Windows

查看:9
本文介绍了在 Windows 上的 Python 2.x 中从命令行参数读取 Unicode 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的 Python 脚本能够在 Windows 中读取 Unicode 命令行参数.但看起来 sys.argv 是以某种本地编码而不是 Unicode 编码的字符串.如何以完整的 Unicode 读取命令行?

I want my Python script to be able to read Unicode command line arguments in Windows. But it appears that sys.argv is a string encoded in some local encoding, rather than Unicode. How can I read the command line in full Unicode?

示例代码:argv.py

import sys

first_arg = sys.argv[1]
print first_arg
print type(first_arg)
print first_arg.encode("hex")
print open(first_arg)

在我为日语代码页设置的 PC 上,我得到:

On my PC set up for Japanese code page, I get:

C:	emp>argv.py "PC・ソフト申請書08.09.24.doc"
PC・ソフト申請書08.09.24.doc
<type 'str'>
50438145835c83748367905c90bf8f9130382e30392e32342e646f63
<open file 'PC・ソフト申請書08.09.24.doc', mode 'r' at 0x00917D90>

我相信这是 Shift-JIS 编码的,它对那个文件名有效".但是对于包含不在 Shift-JIS 字符集中的字符的文件名,它会中断——最终的open"调用失败:

That's Shift-JIS encoded I believe, and it "works" for that filename. But it breaks for filenames with characters that aren't in the Shift-JIS character set—the final "open" call fails:

C:	emp>argv.py Jörgen.txt
Jorgen.txt
<type 'str'>
4a6f7267656e2e747874
Traceback (most recent call last):
  File "C:	empargv.py", line 7,
in <module>
    print open(first_arg)
IOError: [Errno 2] No such file or directory: 'Jorgen.txt'

注意——我说的是 Python 2.x,而不是 Python 3.0.我发现 Python 3.0 将 sys.argv 作为正确的 Unicode.但是现在过渡到 Python 3.0 还为时过早(由于缺乏 3rd 方库支持).

Note—I'm talking about Python 2.x, not Python 3.0. I've found that Python 3.0 gives sys.argv as proper Unicode. But it's a bit early yet to transition to Python 3.0 (due to lack of 3rd party library support).

更新:

一些答案​​说我应该根据 sys.argv 编码的任何内容进行解码.问题在于它不是完整的 Unicode,因此某些字符无法表示.

A few answers have said I should decode according to whatever the sys.argv is encoded in. The problem with that is that it's not full Unicode, so some characters are not representable.

这是让我感到悲伤的用例:我启用了将文件拖放到 Windows 资源管理器中的 .py 文件中.我有包含各种字符的文件名,包括一些不在系统默认代码页中的文件名.当字符在当前代码页编码中无法表示时,我的 Python 脚本在所有情况下都无法通过 sys.argv 获得正确的 Unicode 文件名.

Here's the use case that gives me grief: I have enabled drag-and-drop of files onto .py files in Windows Explorer. I have file names with all sorts of characters, including some not in the system default code page. My Python script doesn't get the right Unicode filenames passed to it via sys.argv in all cases, when the characters aren't representable in the current code page encoding.

当然有一些 Windows API 可以读取带有完整 Unicode 的命令行(Python 3.0 可以做到).我假设 Python 2.x 解释器没有使用它.

There is certainly some Windows API to read the command line with full Unicode (and Python 3.0 does it). I assume the Python 2.x interpreter is not using it.

推荐答案

这是我正在寻找的解决方案,调用 Windows GetCommandLineArgvW 函数:
Windows下获取带有Unicode字符的sys.argv(来自ActiveState)

Here is a solution that is just what I'm looking for, making a call to the Windows GetCommandLineArgvW function:
Get sys.argv with Unicode characters under Windows (from ActiveState)

但我做了一些更改,以简化其使用并更好地处理某些用途.这是我使用的:

But I've made several changes, to simplify its usage and better handle certain uses. Here is what I use:

win32_unicode_argv.py

"""
win32_unicode_argv.py

Importing this will replace sys.argv with a full Unicode form.
Windows only.

From this site, with adaptations:
      http://code.activestate.com/recipes/572200/

Usage: simply import this module into a script. sys.argv is changed to
be a list of Unicode strings.
"""


import sys

def win32_unicode_argv():
    """Uses shell32.GetCommandLineArgvW to get sys.argv as a list of Unicode
    strings.

    Versions 2.x of Python don't support Unicode in sys.argv on
    Windows, with the underlying Windows API instead replacing multi-byte
    characters with '?'.
    """

    from ctypes import POINTER, byref, cdll, c_int, windll
    from ctypes.wintypes import LPCWSTR, LPWSTR

    GetCommandLineW = cdll.kernel32.GetCommandLineW
    GetCommandLineW.argtypes = []
    GetCommandLineW.restype = LPCWSTR

    CommandLineToArgvW = windll.shell32.CommandLineToArgvW
    CommandLineToArgvW.argtypes = [LPCWSTR, POINTER(c_int)]
    CommandLineToArgvW.restype = POINTER(LPWSTR)

    cmd = GetCommandLineW()
    argc = c_int(0)
    argv = CommandLineToArgvW(cmd, byref(argc))
    if argc.value > 0:
        # Remove Python executable and commands if present
        start = argc.value - len(sys.argv)
        return [argv[i] for i in
                xrange(start, argc.value)]

sys.argv = win32_unicode_argv()

现在,我使用它的方式很简单:

Now, the way I use it is simply to do:

import sys
import win32_unicode_argv

从那时起,sys.argv 是一个 Unicode 字符串列表.Python optparse 模块似乎很乐意解析它,这很棒.

and from then on, sys.argv is a list of Unicode strings. The Python optparse module seems happy to parse it, which is great.

这篇关于在 Windows 上的 Python 2.x 中从命令行参数读取 Unicode 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆