在Python 3 CGI脚本中设置编码 [英] Set encoding in Python 3 CGI scripts

查看:106
本文介绍了在Python 3 CGI脚本中设置编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编写 Python 3.1 CGI脚本时,我遇到了可怕的UnicodeDecodeErrors。但是,在命令行上运行脚本时,一切正常。

When writing a Python 3.1 CGI script, I run into horrible UnicodeDecodeErrors. However, when running the script on the command line, everything works.

似乎 open() print()使用 locale.getpreferredencoding()的返回值来知道默认使用哪种编码。在命令行上运行时,该值应为 UTF-8。但是,当通过浏览器运行脚本时,编码神秘地重新定义为 ANSI_X3.4-1968,这似乎只是纯ASCII的奇特名称。

It seems that open() and print() use the return value of locale.getpreferredencoding() to know what encoding to use by default. When running on the command line, that value is 'UTF-8', as it should be. But when running the script through a browser, the encoding mysteriously gets redefined to 'ANSI_X3.4-1968', which appears to be a just a fancy name for plain ASCII.

我现在需要知道如何在所有情况下都以'utf-8'作为默认编码来运行cgi脚本。我的设置是Debian Linux上的Python 3.1.3和Apache2。系统范围内的语言环境是en_GB.utf-8。

I now need to know how to make the cgi script run with 'utf-8' as the default encoding in all cases. My setup is Python 3.1.3 and Apache2 on Debian Linux. The system-wide locale is en_GB.utf-8.

推荐答案

为后来者解答此问题,因为我不认为发布的答案可以解决问题的根源,那就是CGI上下文中缺少语言环境环境变量。我正在使用Python 3.2。

Answering this for late-comers because I don't think that the posted answers get to the root of the problem, which is the lack of locale environment variables in a CGI context. I'm using Python 3.2.


  1. open()打开文本(字符串)或二进制(字节)模式的文件对象阅读和/或写作;在文本模式下,可以在调用中指定用于编码写入文件的字符串和解码从文件读取的字节的编码;如果不是,则由locale.getpreferredencoding()确定,在Linux上,locale.getpreferredencoding()使用您的语言环境设置中的编码,通常是utf-8(例如LANG = en_US.UTF-8)

  1. open() opens file objects in text (string) or binary (bytes) mode for reading and/or writing; in text mode the encoding used to encode strings written to the file, and decode bytes read from the file, may be specified in the call; if it isn't then it is determined by locale.getpreferredencoding(), which on linux uses the encoding from your locale environment settings, which is normally utf-8 (from e.g. LANG=en_US.UTF-8)

>>> f = open('foo', 'w')         # open file for writing in text mode
>>> f.encoding
'UTF-8'                          # encoding is from the environment
>>> f.write('€')                 # write a Unicode string
1
>>> f.close()
>>> exit()
user@host:~$ hd foo
00000000  e2 82 ac      |...|    # data is UTF-8 encoded


  • sys.stdout实际上是一个可以写入的文件在文本模式下,基于locale.getpreferredencoding()的编码;您可以向其中写入字符串,然后根据sys.stdout的编码将其编码为字节;默认情况下,print()写入sys.stdout-print()本身没有编码,而是它写入的文件具有编码;

  • sys.stdout is in fact a file opened for writing in text mode with an encoding based on locale.getpreferredencoding(); you can write strings to it just fine and they'll be encoded to bytes based on sys.stdout's encoding; print() by default writes to sys.stdout - print() itself has no encoding, rather it's the file it writes to that has an encoding;

    >>> sys.stdout.encoding
    'UTF-8'                          # encoding is from the environment
    >>> exit()
    user@host:~$ python3 -c 'print("€")' > foo
    user@host:~$ hd foo
    00000000  e2 82 ac 0a   |....|   # data is UTF-8 encoded; \n is from print()
    

    ;您不能将字节写入sys.stdout-为此使用sys.stdout.buffer.write();如果尝试使用sys.stdout.write()将字节写入sys.stdout,则将返回错误,如果尝试使用print(),则print()会将字节对象简单地转换为字符串对象和转义符像 \xff 这样的序列将被视为四个字符\,x,f,f

    ; you cannot write bytes to sys.stdout - use sys.stdout.buffer.write() for that; if you try to write bytes to sys.stdout using sys.stdout.write() then it will return an error, and if you try using print() then print() will simply turn the bytes object into a string object and an escape sequence like \xff will be treated as the four characters \, x, f, f

    user@host:~$ python3 -c 'print(b"\xe2\xf82\xac")' > foo
    user@host:~$ hd foo
    00000000  62 27 5c 78 65 32 5c 78  66 38 32 5c 78 61 63 27  |b'\xe2\xf82\xac'|
    00000010  0a                                                |.|
    


  • 在CGI脚本中,您需要写入sys.stdout并可以使用print( ) 去做吧;但是Apache中的CGI脚本过程没有语言环境设置-它们不属于CGI规范;因此sys.stdout编码默认为ANSI_X3.4-1968-换句话说,是ASCII;如果您尝试将包含非ASCII字符的字符串print()传递给sys.stdout,则会得到 UnicodeEncodeError:'ascii'编解码器无法编码字符...:序数不在范围(128)之内

  • in a CGI script you need to write to sys.stdout and you can use print() to do it; but a CGI script process in Apache has no locale environment settings - they are not part of the CGI specification; therefore the sys.stdout encoding defaults to ANSI_X3.4-1968 - in other words, ASCII; if you try to print() a string that contain non-ASCII characters to sys.stdout you'll get "UnicodeEncodeError: 'ascii' codec can't encode character...: ordinal not in range(128)"

    一个简单的解决方案是在服务器或虚拟主机配置中使用Apache的mod_env PassEnv命令将Apache进程的LANG环境变量传递到CGI脚本。在Debian / Ubuntu上,确保在/ etc / apache2 / envvars中取消注释。/ etc / default / locale行,以便Apache以系统默认语言环境而不是C(Posix)语言环境(也是ASCII)运行编码);以下CGI脚本应在Python 3.2中运行且没有错误:

    a simple solution is to pass the Apache process's LANG environment variable through to the CGI script using Apache's mod_env PassEnv command in the server or virtual host configuration: PassEnv LANG; on Debian/Ubuntu make sure that in /etc/apache2/envvars you have uncommented the line ". /etc/default/locale" so that Apache runs with the system default locale and not the C (Posix) locale (which is also ASCII encoding); the following CGI script should run without errors in Python 3.2:

    #!/usr/bin/env python3
    import sys
    print('Content-Type: text/html; charset=utf-8')
    print()
    print('<html><body><pre>' + sys.stdout.encoding + '</pre>h€lló wörld<body></html>')
    

         

          

    这篇关于在Python 3 CGI脚本中设置编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆