比较Unicode和字符串 [英] comparing Unicode and string
问题描述
您好,
这里让我感到惊讶。
#coding:iso-8859-1
s1 = uFrauMüllermachte gro?e Augen
s2 =FrauMüllermachte gro?e Augen
如果s1 == s2:
传递
运行此代码会产生UnicodeDecodeError:
Traceback(最近一次调用最后一次):
文件" tmp.py",第4行,在?
如果s1 == s2:
UnicodeDecodeError:''ascii''编解码器可以'' t解码位置6中的字节0xfc:
序数不在范围内(128)
我原本期望s1 == s2给出真...或者可能是假...
但是在这里引发错误是不必要的。我猜比较
运算符决定将s2转换为Unicode但忘记我在文件开头说了
#coding:iso-8859-1。 br />
TIA任何评论。
Luc Saffre
Hello,
here is something that surprises me.
#coding: iso-8859-1
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass
Running this code produces a UnicodeDecodeError:
Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)
I would have expected that "s1 == s2" gives True... or maybe False...
but raising an error here is unnecessary. I guess that the comparison
operator decides to convert s2 to a Unicode but forgets that I said
#coding: iso-8859-1 at the beginning of the file.
TIA for any comments.
Luc Saffre
推荐答案
lu********@gmail.com 写道:
lu********@gmail.com wrote:
你好,
这里让我感到惊讶。
#编码:iso-8859-1
s1 = uFrauMüllermachte gro?e Augen
s2 ="FrauMüllermachte gro?e Augen" >
如果s1 == s2:
通过
运行此代码会产生UnicodeDecodeError:
跟踪(最近一次呼叫最后一次):
文件tmp.py,第4行,在?
如果s1 == s2:
UnicodeDecodeError:''ascii''编解码器无法解码位置6的字节0xfc:
序数不在范围内(128)
我原以为s1 == s2给出真...或者可能是假...
但是在这里引发错误是不必要的。我猜比较
运算符决定将s2转换为Unicode,但忘记了我在文件开头说了
#coding:iso-8859-1。
Hello,
here is something that surprises me.
#coding: iso-8859-1
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass
Running this code produces a UnicodeDecodeError:
Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)
I would have expected that "s1 == s2" gives True... or maybe False...
but raising an error here is unnecessary. I guess that the comparison
operator decides to convert s2 to a Unicode but forgets that I said
#coding: iso-8859-1 at the beginning of the file.
#coding声明在运行时无效。这是
严格来指导编译器如何编译
字节字符串。
运行时的默认编码时间是ascii除非
它被设置为其他东西,这就是为什么
错误消息指定为ascii。
John Roth
The #coding declaration is not effective at runtime. It''s
there strictly to guide the compiler in how to compile
byte strings.
The default encoding at run time is ascii unless
it''s been set to something else, which is why the
error message specifies ascii.
John Roth
TIA任何评论。
Luc Saffre
TIA for any comments.
Luc Saffre
2006-10-16, lu ******** @ gmail.com < lu ******** @ gmail.comwrote:
On 2006-10-16, lu********@gmail.com <lu********@gmail.comwrote:
你好,
这里让我感到惊讶。
#coding:iso-8859-1
Hello,
here is something that surprises me.
#coding: iso-8859-1
我认为'应该是:
# - * - 编码:iso-8859-1 - * -
特殊注释只会改变unicode的编码
literals。特别是,它不会改变str文字的默认编码
。
I think that''s supposed to be:
# -*- coding: iso-8859-1 -*-
The special comment changes only the encoding of unicode
literals. In particular, it doesn''t change the default encoding
of str literals.
s1 = uFrauMüllermachte gro?e Augen"
s2 =&'FrauMüllermachte gro?e Augen"
如果s1 == s2:
通过
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass
在我的机器上,ü和?在s2中存储在我的终端的编码,cp437的代码中。
点。不幸的是cp437代码
从127-255分不同于iso-8859-1中的那些。
要解决这个问题,我必须这样做以下:
On my machine, the ü and ? in s2 are being stored in the code
points of my terminal''s encoding, cp437. Unforunately cp437 code
points from 127-255 are not the same as those in iso-8859-1.
To fix this, I have to do the following:
>> s1 == s2.decode(''cp437'' )
>>s1 == s2.decode(''cp437'')
True
True
运行此代码会产生UnicodeDecodeError:<回溯(最近一次调用最后一次):
文件tmp.py,第4行,在?
如果s1 == s2:
UnicodeDecodeError:''ascii''编解码器无法解码位置6的字节0xfc:
序数不在范围内(128)
我原以为s1 == s2给出真...或者可能
错误......但是这里提出错误是不必要的。我想比较运算符决定将s2转换为Unicode但是
忘记我在
档案。
Running this code produces a UnicodeDecodeError:
Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)
I would have expected that "s1 == s2" gives True... or maybe
False... but raising an error here is unnecessary. I guess that
the comparison operator decides to convert s2 to a Unicode but
forgets that I said #coding: iso-8859-1 at the beginning of the
file.
它试图将s2解释为ascii,并且失败,因为129和
225代码点超出范围。
-
Neil Cerutti
It''s trying to interpret s2 as ascii, and failing, since 129 and
225 code points are out of range.
--
Neil Cerutti
谢谢John和Neil的解释。 />
我觉得很难向Python初学者解释为什么
发生这个错误。
建议:不应该当我尝试分配s2时,错误已经提升了吗?绝不允许
普通字符串包含使用系统编码无法编码的字符。这个测试可以在
编译时进行,并且会让Python更加有用。
Luc
lu ******** @ gmail.com schrieb:
Thanks, John and Neil, for your explanations.
Still I find it rather difficult to explain to a Python beginner why
this error occurs.
Suggestion: shouldn''t an error raise already when I try to assign s2? A
normal string should never be allowed to contain characters that are
not codable using the system encoding. This test could be made at
compile time and would render Python more didadic.
Luc
lu********@gmail.com schrieb:
你好,
这里让我感到惊讶。
#coding:iso-8859-1
s1 = uFrauMüllermachte gro?e Augen
s2 =FrauMüllermachte gro?e Augen
如果s1 == s2:
pass
运行此代码会产生UnicodeDecodeError:
Traceback(最近一次调用最后一次):
文件" tmp.py",第4行,在?
如果s1 == s2:
UnicodeDecodeError:''ascii''编解码器无法解码位置6的字节0xfc:
序列不在范围内(128)
我原本期望s1 == s2给出真...或者可能是假...
但是在这里引发错误是不必要的。我猜比较
运算符决定将s2转换为Unicode但忘记我在文件开头说了
#coding:iso-8859-1。 br />
TIA任何评论。
Luc Saffre
Hello,
here is something that surprises me.
#coding: iso-8859-1
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass
Running this code produces a UnicodeDecodeError:
Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)
I would have expected that "s1 == s2" gives True... or maybe False...
but raising an error here is unnecessary. I guess that the comparison
operator decides to convert s2 to a Unicode but forgets that I said
#coding: iso-8859-1 at the beginning of the file.
TIA for any comments.
Luc Saffre
这篇关于比较Unicode和字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!