如何正确使用python中的unicode字符来避免错误? [英] How do I properly work with unicode characters in python to keep from getting errors?

查看:158
本文介绍了如何正确使用python中的unicode字符来避免错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Google快速搜索框开发一个python插件,并且它使用非ASCII字符做了一些奇怪的事情。似乎代码工作正常,直到我尝试构造一个包含非ASCII字符的字符串(ü已经是我的测试字符)。我正在使用以下代码片段进行构造,其中new_task是从GQSB输入的变量。

  the_sig =( %sapi_key%sauth_token%smethod%sname%sparse%stimeline%s%
(api_secret,api_key,the_token,method,new_task,doParse,timeline)
pre>

它给我这个错误:


UnicodeDecodeError:'ascii'无法解码位置0的字节0xc3:序号不在范围(128)


我正确理解,这是因为我尝试将asici字符串中的Unicode字符串串在一起。我可以找到的一切告诉我要用顶端的方式声明编码:

 # -  *  - 编码:iso-8859- 15  -  *  -  

我有哪些当我将构建字符串的代码片段拉入一个新脚本时,它工作得很好。但是由于某种原因,在其他代码的上下文中,它每次都失败。我唯一可以想到的是,这是因为它属于自己的类,但对我来说并不理解。



可以找到完整的代码在GitHub上 here



提前感谢任何帮助。

解决方案

有一些事情你应该解决这个问题。


  1. 将包含非ASCII字符的所有字符串文字转换为Unicode文字。例如:u'über'


  2. 对Unicode进行中间处理。换句话说,如果您收到编码的字符串(无论编码),请在处理之前将其解码为Unicode。示例:

      s = utf8_string.decode('utf8')+ latin1_string.decode('latin1')


  3. 输出字符串或将其发送到某处时,请用您的接收者理解的编码进行编码。例如: send(s.encode('utf8'))


完整示例:

  input1 = get_possibly_nonascii_input()。decode('iso-8859-1')
input2 = get_possibly_nonascii_input()。decode('iso-8859-1')
input3 =u'üvw'

s = u'%s - > %s'%(input3,(input1 + input2).upper())

send_output(s.encode('utf8'))


I'm working on a python plugin for Google Quick Search Box, and it's doing some odd things with non-ascii characters. It seems like the code works fine up until I try constructing a string containing the non-ascii characters (ü has been my test character). I am using the following code snippet for the construction, with new_task as the variable that is being input from GQSB.

the_sig = ("%sapi_key%sauth_token%smethod%sname%sparse%stimeline%s" %
           (api_secret, api_key, the_token, method, new_task, doParse, timeline))

It's giving me this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I am understanding correctly, this is because I am trying to string together a unicode character inside an ascii string. Everything I could find told me to declare the encoding at the top with this:

# -*- coding: iso-8859-15 -*-

Which I have. And when I pull the code snippet that constructs the string into a new script, it works just fine. But for some reason, int he context of the rest of the code, it fails, every time. The only thing I can think of is that it is because it's inside it's own class, but that doesn't make any sense to me.

The full code can be found on GitHub here

Thanks in advance for any help. I am stumped on this one.

解决方案

There are a few things you should do to fix this.

  1. Convert all string literal that contain non-ASCII characters to Unicode literals. Example: u'über'.

  2. Do intermediate processing on Unicode. In other words, if you receive an encoded string (no matter the encoding), decode it to Unicode before working on it. Example:

    s = utf8_string.decode('utf8') + latin1_string.decode('latin1')
    

  3. When outputting the string or sending it somewhere, encode it with an encoding that your receiver understands. Example: send(s.encode('utf8')).

Complete example:

input1 = get_possibly_nonascii_input().decode('iso-8859-1')
input2 = get_possibly_nonascii_input().decode('iso-8859-1')
input3 = u'üvw'

s =  u'%s -> %s' % (input3, (input1 + input2).upper())

send_output(s.encode('utf8'))

这篇关于如何正确使用python中的unicode字符来避免错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆