Python 3.5 不处理来自 CLI 参数的 unicode 输入 [英] Python 3.5 not handling unicode input from CLI argument

查看:67
本文介绍了Python 3.5 不处理来自 CLI 参数的 unicode 输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的脚本,我正在尝试使用它来自动化我为工作所做的一些日语翻译.

I have a simple script that I'm attempting to use automate some of the japanese translation I do for my job.

 import requests
 import sys
 import json
 base_url = 'https://www.googleapis.com/language/translate/v2?key=CANT_SHARE_THAT&source=ja&target=en&q='
 print(sys.argv[1])
 base_url += sys.argv[1]
 request = requests.get( base_url )
 if request.status_code != 200:
      print("Error on request")
 print( json.loads(request.text)['data']['translations'][0]['translatedText'])

当第一个参数是像火设定クリア这样的字符串时,这个脚本会在第一行爆炸

When the first argument is a string like 初期設定クリア this script will explode at line

 print(sys.argv[1])

随着消息:

 line 5, in encode
 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
 UnicodeEncodeError: 'charmap' codec can't encode characters in 
 position 0-6: character maps to <undefined>

这样也可以减少bug

 import sys
 print(sys.argv[1])

这似乎是一个编码问题.我用的是Python 3.5.1,终端是Windows7 x64下的MINGW64.

Which seems like an encoding problem. I'm using Python 3.5.1, and the terminal is MINGW64 under Windows7 x64.

当我在 Rust1.8 中编写相同的程序时(并且可执行文件在相同的条件下运行,即:Windows7 x64 下的 MINGW64)

When I write the same program in Rust1.8 (and the executable is ran under same conditions, i.e.: MINGW64 under Windows7 x64)

  use std::env;
  fn main() {
         let args: Vec<String> = env::args().skip(1).collect();
         print!("First arg: {}", &args[0] );
  }

它产生正确的输出:

  $ rustc unicode_example.rs
  $ ./unicode_example.exe 初期設定クリア
  First arg: 初期設定クリア

所以我试图了解这里发生了什么.MINGW64 声称有适当的 UTF-8 支持,这它也出现了.Python3.5.1 没有完整的 UTF-8 支持吗?我假设迁移到 Python3.X 是因为 Unicode 支持.

So I'm trying to understand what is happening here. MINGW64 claims to have proper UTF-8 support, which it appears too. Does Python3.5.1 not have full UTF-8 support? I was under the assumption the move to Python3.X was because of Unicode support.

推荐答案

变化

 print(sys.argv[1])

 print(sys.argv[1].encode("utf-8"))

会导致python转储一串字节

Will cause python to dump a string of bytes

 $ python google_translate.py 初期設定クリア
 b'\xe5\x88\x9d\xe6\x9c\x9f\xe8\xa8\xad\xe5\xae\x9a\xe3\x82\xaf\xe3\x83
 \xaa\xe3\x82\xa2'

尽管如此,它仍然有效.所以这个错误,如果这是一个错误......发生在python正在解码内部字符串以打印到终端时,而不是当参数被编码为python字符串时.

Nonetheless it works. So the bug, if this is a bug... Is happening when python is decoding the internal string to print into the terminal, not when the argument is being encoded INTO a python string.

同样只需删除打印语句即可修复错误.

Also simply removing the print statement fixes the bug as well.

这篇关于Python 3.5 不处理来自 CLI 参数的 unicode 输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆