urllib.unquote和unicode [英] urllib.unquote and unicode
问题描述
以下代码段导致(至少)
最后三个主要版本的不同结果:
< blockquote class =post_quotes>
>> import urllib
urllib.unquote(u''%94'')
#Python 2.3.4
你''%94''
#Python 2.4.2
UnicodeDecodeError:''ascii''编解码器不能解码位置0的字节0x94:
序列不在范围内(128)
#Python 2.5
u''\x94''
当前版本是否为正确版本一个或这个函数应该每隔一周更换一次吗?
乔治
George Sakkis写道:
以下片段导致(至少)<不同的结果
最后三个主要版本:
> import urllib
urllib.unquote(u' '%94'')
#Python 2.3.4
你''%94''
#Python 2.4.2
UnicodeDecodeError:''ascii''编解码器无法解码位置0的字节0x94:
序数不范围内(128)
#Python 2.5
u''\x94''
是当前版本的正确一个或者这个函数应该每隔一周更换一次吗?
恕我直言,结果都没有。通过提高ValueError来拒绝unicode字符串
,或者它应该用ascii
编码编码,结果应该与
urllib相同。 unquote(u''%94''。encode(''ascii''))是''\ x94''。您可以将当前行为视为未定义,就像您将一个随机对象传递给某个函数一样,你可以在不同的python中得到不同的结果
版本。
- Leo
George Sakkis写道:
以下片段导致(至少)
最后三个主要版本的不同结果:
>>> import urllib
urllib.unquote(u''%94'')
#Python 2.4.2
UnicodeDecodeError:''ascii''编解码器无法解码位置0的字节0x94:
序数不范围(128)
Python 2.4.3(#3,2006年8月23日,09:40:15)
[GCC 3.3 .3(SuSE Linux)] on linux2
输入help,copyright ;,信用或许可证或欲获得更多信息。
>> import urllib
urllib.unquote(u"%94" ;)
u''\ x94''
>>>
从上面我推断2.4.2行为被认为是一个bug。
Peter
George Sakkis写道:
以下代码段导致不同的结果(至少)
最后三个主要版本:
>>> import urllib
urllib.unquote(你'%94'')
#Python 2.3.4
u ''%94''
#Python 2.4.2
UnicodeDecodeError:''ascii''编解码器无法解码位置0的字节0x94:
序数不在范围内(128)
#Python 2.5
u''\ x94''
当前版本是正确吗?一个或者这个函数应该每隔一周更换一次吗?
为什么你要将非ASCII Unicode字符串传递给专为
设计的函数,首先修复8位字符串?如果你在引用之前做了正确的编码
,它将在所有Python版本中以相同的方式工作。
< / F>
The following snippet results in different outcome for (at least) the
last three major releases:
>>import urllib
urllib.unquote(u''%94'')
# Python 2.3.4
u''%94''
# Python 2.4.2
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x94 in position 0:
ordinal not in range(128)
# Python 2.5
u''\x94''
Is the current version the "right" one or is this function supposed to
change every other week ?
George
George Sakkis wrote:The following snippet results in different outcome for (at least) the
last three major releases:
>import urllib
urllib.unquote(u''%94'')
# Python 2.3.4
u''%94''
# Python 2.4.2
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x94 in position 0:
ordinal not in range(128)
# Python 2.5
u''\x94''
Is the current version the "right" one or is this function supposed to
change every other week ?IMHO, none of the results is right. Either unicode string should be
rejected by raising ValueError or it should be encoded with ascii
encoding and result should be the same as
urllib.unquote(u''%94''.encode(''ascii'')) that is ''\x94''. You can consider
current behaviour as undefined just like if you pass a random object
into some function you can get different outcome in different python
versions.
-- Leo
George Sakkis wrote:
The following snippet results in different outcome for (at least) the
last three major releases:
>>>import urllib
urllib.unquote(u''%94'')
# Python 2.4.2
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x94 in position 0:
ordinal not in range(128)Python 2.4.3 (#3, Aug 23 2006, 09:40:15)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>import urllib
urllib.unquote(u"%94")
u''\x94''
>>>
From the above I infer that the 2.4.2 behaviour was considered a bug.
Peter
George Sakkis wrote:
The following snippet results in different outcome for (at least) the
last three major releases:
>>>import urllib
urllib.unquote(u''%94'')
# Python 2.3.4
u''%94''
# Python 2.4.2
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0x94 in position 0:
ordinal not in range(128)
# Python 2.5
u''\x94''
Is the current version the "right" one or is this function supposed to
change every other week ?why are you passing non-ASCII Unicode strings to a function designed for
fixing up 8-bit strings in the first place? if you do proper encoding
before you quote things, it''ll work the same way in all Python releases.
</F>
这篇关于urllib.unquote和unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!