比较Unicode和字符串 [英] comparing Unicode and string

查看:120
本文介绍了比较Unicode和字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


这里让我感到惊讶。


#coding:iso-8859-1

s1 = uFrauMüllermachte gro?e Augen

s2 =FrauMüllermachte gro?e Augen

如果s1 == s2:

传递


运行此代码会产生UnicodeDecodeError:


Traceback(最近一次调用最后一次):

文件" tmp.py",第4行,在?

如果s1 == s2:

UnicodeDecodeError:''ascii''编解码器可以'' t解码位置6中的字节0xfc:

序数不在范围内(128)


我原本期望s1 == s2给出真...或者可能是假...

但是在这里引发错误是不必要的。我猜比较

运算符决定将s2转换为Unicode但忘记我在文件开头说了

#coding:iso-8859-1。 br />

TIA任何评论。


Luc Saffre

Hello,

here is something that surprises me.

#coding: iso-8859-1
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass

Running this code produces a UnicodeDecodeError:

Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)

I would have expected that "s1 == s2" gives True... or maybe False...
but raising an error here is unnecessary. I guess that the comparison
operator decides to convert s2 to a Unicode but forgets that I said
#coding: iso-8859-1 at the beginning of the file.

TIA for any comments.

Luc Saffre

推荐答案


lu********@gmail.com 写道:

lu********@gmail.com wrote:

你好,


这里让我感到惊讶。


#编码:iso-8859-1

s1 = uFrauMüllermachte gro?e Augen

s2 ="FrauMüllermachte gro?e Augen"
如果s1 == s2:

通过


运行此代码会产生UnicodeDecodeError:


跟踪(最近一次呼叫最后一次):

文件tmp.py,第4行,在?

如果s1 == s2:

UnicodeDecodeError:''ascii''编解码器无法解码位置6的字节0xfc:

序数不在范围内(128)


我原以为s1 == s2给出真...或者可能是假...

但是在这里引发错误是不必要的。我猜比较

运算符决定将s2转换为Unicode,但忘记了我在文件开头说了

#coding:iso-8859-1。
Hello,

here is something that surprises me.

#coding: iso-8859-1
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass

Running this code produces a UnicodeDecodeError:

Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)

I would have expected that "s1 == s2" gives True... or maybe False...
but raising an error here is unnecessary. I guess that the comparison
operator decides to convert s2 to a Unicode but forgets that I said
#coding: iso-8859-1 at the beginning of the file.



#coding声明在运行时无效。这是

严格来指导编译器如何编译

字节字符串。


运行时的默认编码时间是ascii除非

它被设置为其他东西,这就是为什么

错误消息指定为ascii。


John Roth

The #coding declaration is not effective at runtime. It''s
there strictly to guide the compiler in how to compile
byte strings.

The default encoding at run time is ascii unless
it''s been set to something else, which is why the
error message specifies ascii.

John Roth



TIA任何评论。


Luc Saffre

TIA for any comments.

Luc Saffre


2006-10-16, lu ******** @ gmail.com < lu ******** @ gmail.comwrote:
On 2006-10-16, lu********@gmail.com <lu********@gmail.comwrote:

你好,


这里让我感到惊讶。


#coding:iso-8859-1
Hello,

here is something that surprises me.

#coding: iso-8859-1



我认为'应该是:


# - * - 编码:iso-8859-1 - * -


特殊注释只会改变unicode的编码

literals。特别是,它不会改变str文字的默认编码


I think that''s supposed to be:

# -*- coding: iso-8859-1 -*-

The special comment changes only the encoding of unicode
literals. In particular, it doesn''t change the default encoding
of str literals.


s1 = uFrauMüllermachte gro?e Augen"

s2 =&'FrauMüllermachte gro?e Augen"

如果s1 == s2:

通过
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass



在我的机器上,ü和?在s2中存储在我的终端的编码,cp437的代码中。

点。不幸的是cp437代码

从127-255分不同于iso-8859-1中的那些。


要解决这个问题,我必须这样做以下:

On my machine, the ü and ? in s2 are being stored in the code
points of my terminal''s encoding, cp437. Unforunately cp437 code
points from 127-255 are not the same as those in iso-8859-1.

To fix this, I have to do the following:


>> s1 == s2.decode(''cp437'' )
>>s1 == s2.decode(''cp437'')



True

True


运行此代码会产生UnicodeDecodeError:<回溯(最近一次调用最后一次):

文件tmp.py,第4行,在?

如果s1 == s2:

UnicodeDecodeError:''ascii''编解码器无法解码位置6的字节0xfc:

序数不在范围内(128)


我原以为s1 == s2给出真...或者可能

错误......但是这里提出错误是不必要的。我想比较运算符决定将s2转换为Unicode但是

忘记我在

档案。
Running this code produces a UnicodeDecodeError:

Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)

I would have expected that "s1 == s2" gives True... or maybe
False... but raising an error here is unnecessary. I guess that
the comparison operator decides to convert s2 to a Unicode but
forgets that I said #coding: iso-8859-1 at the beginning of the
file.



它试图将s2解释为ascii,并且失败,因为129和

225代码点超出范围。


-

Neil Cerutti

It''s trying to interpret s2 as ascii, and failing, since 129 and
225 code points are out of range.

--
Neil Cerutti


谢谢John和Neil的解释。 />

我觉得很难向Python初学者解释为什么

发生这个错误。


建议:不应该当我尝试分配s2时,错误已经提升了吗?绝不允许

普通字符串包含使用系统编码无法编码的字符。这个测试可以在

编译时进行,并且会让Python更加有用。


Luc

lu ******** @ gmail.com schrieb:
Thanks, John and Neil, for your explanations.

Still I find it rather difficult to explain to a Python beginner why
this error occurs.

Suggestion: shouldn''t an error raise already when I try to assign s2? A
normal string should never be allowed to contain characters that are
not codable using the system encoding. This test could be made at
compile time and would render Python more didadic.

Luc

lu********@gmail.com schrieb:

你好,


这里让我感到惊讶。


#coding:iso-8859-1

s1 = uFrauMüllermachte gro?e Augen

s2 =FrauMüllermachte gro?e Augen

如果s1 == s2:

pass


运行此代码会产生UnicodeDecodeError:


Traceback(最近一次调用最后一次):

文件" tmp.py",第4行,在?

如果s1 == s2:

UnicodeDecodeError:''ascii''编解码器无法解码位置6的字节0xfc:

序列不在范围内(128)


我原本期望s1 == s2给出真...或者可能是假...

但是在这里引发错误是不必要的。我猜比较

运算符决定将s2转换为Unicode但忘记我在文件开头说了

#coding:iso-8859-1。 br />

TIA任何评论。


Luc Saffre
Hello,

here is something that surprises me.

#coding: iso-8859-1
s1=u"Frau Müller machte gro?e Augen"
s2="Frau Müller machte gro?e Augen"
if s1 == s2:
pass

Running this code produces a UnicodeDecodeError:

Traceback (most recent call last):
File "tmp.py", line 4, in ?
if s1 == s2:
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xfc in position 6:
ordinal not in range(128)

I would have expected that "s1 == s2" gives True... or maybe False...
but raising an error here is unnecessary. I guess that the comparison
operator decides to convert s2 to a Unicode but forgets that I said
#coding: iso-8859-1 at the beginning of the file.

TIA for any comments.

Luc Saffre


这篇关于比较Unicode和字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆