sys.argv 作为 Python 3k 中的字节 [英] sys.argv as bytes in Python 3k

查看:106
本文介绍了sys.argv 作为 Python 3k 中的字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于 Python 3k 引入了字符串和字节之间的严格区别,数组 sys.argv 中的命令行参数以字符串形式呈现.有时需要将参数视为字节,例如传递不需要在 Unix 中使用任何特定字符编码的路径时.

As Python 3k introduces strict distinction between strings and bytes, command line arguments in the array sys.argv are presented as strings. Sometimes it is necessary to treat the arguments as bytes, e.g. when passing a path that needn't to be in any particular character encoding in Unix.

让我们看一个例子.一个简短的 Python 3k 程序 argv.py 如下:

Let's see an example. A brief Python 3k program argv.py follows:

import sys

print(sys.argv[1])
print(b'bytes')

当它作为 python3.1 argv.py français 执行时,它会产生预期的输出:

When it is executed as python3.1 argv.py français it produces expected output:

法语

b'bytes'

请注意,参数 français 在我的语言环境编码中.但是,当我们以不同的编码传递参数时,我们会得到一个错误:python3.1 argv.py `echo français|iconv -t latin1`

Note that the argument français is in my locale encoding. However, when we pass the argument in a different encoding we obtain an error: python3.1 argv.py `echo français|iconv -t latin1`

Traceback (most recent call last):
  File "argv.py", line 3, in <module>
    print(sys.argv[1])
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udce7' in position 4: surrogates not allowed

我们如何通过命令行参数将二进制数据传递给 Python 3k 程序?一个用法示例是将路径传递给使用其他语言环境的用户的文件.

How shall we pass binary data to Python 3k program via command line arguments? An example of usage is passing a path to a file of a user who uses other locale.

推荐答案

请注意,错误是 UnicodeEncodeError 而不是 UnicodeDecodeError.Python 保留了在命令行上传递的确切字节(通过 PEP 383 surrogateescape 错误处理程序),但这些字节不是有效的 UTF-8,因此不能这样编码以写入控制台.

Note that the error is a UnicodeEncodeError rather than a UnicodeDecodeError. Python is preserving the exact bytes passed on the command line (via the PEP 383 surrogateescape error handler), but those bytes are not valid UTF-8 and hence can't be encoded as such for writing to the console.

处理此问题的最佳方法是使用正确编码的应用程序级别知识来重新解释应用程序内部的命令行参数,如以下示例代码所示:

The best way to deal with this is to use the application level knowledge of the correct encoding to reinterpret the command line argument inside the application, as in the following example code:

$ python3.2 -c "import os, sys; print(os.fsencode(sys.argv[1]).decode('latin-1'))" `echo français|iconv -t latin1`
français

os.fsencode 函数调用会反转 Python 在处理命令行参数时自动应用的转换.decode('latin-1') 方法调用然后执行正确的转换以获得正确解码的字符串.

The os.fsencode function invocation reverses the transformation Python applied automatically when processing the command line arguments. The decode('latin-1') method invocation then performs the correct conversion in order to get a properly decoded string.

Python 3.2 专门添加了 os.fsencode 以使此类问题更容易处理.

Python 3.2 added os.fsencode to specifically to make this kind of problem easier to deal with.

对于 Python 3.1os.fsencode(sys.argv[1]) 的等效构造是 sys.argv[1].encode(sys.getfilesystemencoding(), 'surrogateescape')

For Python 3.1, the equivalent construct for os.fsencode(sys.argv[1]) is sys.argv[1].encode(sys.getfilesystemencoding(), 'surrogateescape')

2013 年 2 月针对 Python 3.2+ 进行了更新,并避免假设 Python 自动检测到UTF-8"作为命令行编码

这篇关于sys.argv 作为 Python 3k 中的字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆