将 unicode 对象与字符串对象进行比较时的奇怪行为 [英] Strange behavior when comparing unicode objects with string objects

查看:46
本文介绍了将 unicode 对象与字符串对象进行比较时的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 python 中比较两个字符串时,它工作正常,当比较 string 对象与 unicode 对象时,它按预期失败,但是在比较 string 带有转换的 unicode 的对象 (unicode --> str) 对象失败

演示:

按预期工作:

<预><代码>>>>如果s"是s":打印万岁!"...欢呼!


几乎是的:

<预><代码>>>>如果's'是你的':打印万岁!"...


预期之外:

<预><代码>>>>如果 's' 是 str(u's'):打印万岁!"...


当两个类型属于同一类时,为什么第三个示例不能按预期工作?

<预><代码>>>>类型('s')<输入'str'>>>>类型(str(u's))<输入'str'>

解决方案

为此不要使用 is,使用 ==.您正在比较对象是否具有相同的身份,而不是它们是否相等.当然,如果是同一个对象,它们就相等(==),但如果它们相等,它们就不一定是同一个对象.

第一个工作的事实是 CPython 的实现细节.小字符串,因为它们是不可变的,可以被解释器插入.每次在源代码中放入字符串 "s" 时,Cpython 都会重用相同的对象.然而,显然 str("s") 返回一个具有相同值的新字符串.这并不奇怪.

<小时>

您可能会问自己,为什么要实习字符串 's'?".这是一个合理的问题.毕竟,这是一个很短的字符串——在您的源中浮动多个副本需要多少内存?答案(我认为)是因为字典查找.由于以字符串作为键的字典在 python 中非常常见,当指针比较返回 false 时,您可以通过进行闪电般快速的指针比较(回退到较慢的 strcmp)来加速键的哈希函数/相等性检查.

when comparing two strings in python, it works fine and when comparing a string object with a unicode object it fails as expected however when comparing a string object with a converted unicode (unicode --> str) object it fails

A Demo:

Works as expected:

>>> if 's' is 's': print "Hurrah!"
... 
Hurrah!


Pretty much yeah:

>>> if 's' is u's': print "Hurrah!"
... 


Not expected:

>>> if 's' is str(u's'): print "Hurrah!"
... 


Why doesn't the third example work as expected when both the type's are of the same class?

>>> type('s')
<type 'str'>

>>> type(str(u's'))
<type 'str'>

解决方案

Don't use is for this, use ==. You're comparing whether the objects have the same identity, not whether they are equal. Of course, if the are the same object, they will be equal (==), but if they are equal, they aren't necessarily the same object.

The fact that the first one works is an implementation detail of CPython. Small strings, since they're immutable can be interned by the interpreter. Every time you put the string "s" in your source code, Cpython reuses the same object. however, apparently str("s") returns a new string with the same value. This isn't all that surprising.


You might be asking yourself, "why intern the string 's' at all?". That's a reasonable question. After all, it's a short string -- How much memory could having multiple copies floating around in your source take? The answer (I think) is because of dictionary lookups. Since dicts with strings as keys are so common in python, you can speed up the hash function/equality checking of keys by doing lightning fast pointer comparisons (falling back on slower strcmp) when the pointer comparison returns false.

这篇关于将 unicode 对象与字符串对象进行比较时的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆