python是带字符串的操作员行为 [英] python is operator behaviour with string

查看:74
本文介绍了python是带字符串的操作员行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法理解以下行为.我正在创建2个字符串,并使用is运算符对其进行比较.在第一种情况下,它的工作方式有所不同.在第二种情况下,它可以按预期工作.我使用逗号或空格时,为什么与is相比显示False是什么原因,而当不使用逗号,空格或其他字符时,给出True

I am unable to understand the following behaviour. I am creating 2 strings, and using is operator to compare it. On the first case, it is working differently. On the second case, it works as expected. What is the reason when I use comma or space, it is showing False on comparing with is and when no comma or space or other characters are used, it gives True

Python 3.6.5 (default, Mar 30 2018, 06:41:53) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'string'
>>> b = a
>>> b is a
True
>>> b = 'string'
>>> b is a
True
>>> a = '1,2,3,4'
>>> b = a
>>> b is a
True
>>> b = '1,2,3,4'
>>> b is a
False

是否存在关于python为什么以不同方式解释字符串的可靠信息?我了解最初,ab指的是同一对象.然后b获取一个新对象,但b is a仍然说True.理解行为几乎不会引起混淆.

Is there a reliable information on why python interprets strings in different way? I understand that initially, a and b refers to same object. And then b gets a new object, still b is a says True. It is little confusing to understand the behaviour.

当我使用'string'时-它会产生相同的结果.当我使用'1,2,3,4'时怎么了-它们都是字符串.与案例1和案例2有什么不同?即is运算符针对字符串的不同内容产生不同的结果.

When I do it with 'string' - it produces same result. What's wrong when I use '1,2,3,4' - they both are strings. What's different from case 1 and case 2 ? i.e is operator producing different results for different contents of the strings.

推荐答案

此行为的重要一件事是Python缓存了一些短字符串(通常少于20个字符,但不是每个字符的组合),因此它们变得快速可用.这样做的一个重要原因是,字符串在Pyhton的源代码中得到了广泛使用,并且是对某些特殊类型的字符串进行缓存的内部优化.字典是Python源代码中常用的数据结构之一,通常用于保留变量,属性和名称空间,以及用于某些其他目的,它们都使用字符串作为对象名称.这就是说,每次您尝试访问对象属性或访问变量(局部或全局)时,都会在内部触发一个字典查找.

One important thing about this behavior is that Python caches some, mostly, short strings (usually less than 20 characters but not for every combinations of them) so that they become quickly accessible. One important reason for that is that strings are widely used in Pyhton's source code and it's an internal optimization to cache some special sorts of strings. Dictionaries are one of the generally used data structures in Python's source code that are used for preserving the variables, attributes, and namespaces in general, plus for some other purposes, and they all use strings as the object names. This is to say that every time you try to access an object attribute or have access to a variable (local or global) there's a dictionary look up firing up internally.

现在,出现这种奇怪行为的原因是因为Python(Cpython实现)在内部处理方面对字符串的处理不同.在Python的源代码中,有一个 intern_string_constants 函数,可为字符串提供要进行验证的验证,您可以检查更多详细信息.或查看此综合文章 http://guilload.com/python-string-interning/

Now, the reason that you got such bizarre behavior is because Python (Cpython implementation) treats differently with strings in terms of interning. In Python's source code there is a intern_string_constants function that gives strings the validation to be interned which you can check for more details. Or check this comprehensive article http://guilload.com/python-string-interning/.

还值得注意的是,Python在sys模块中具有一个intern()函数,您可以使用该函数手动插入字符串.

It's also note worthy that Python has an intern() function in sys module that you can use to intern strings manually.

In [52]: b = sys.intern('a,,')

In [53]: c = sys.intern('a,,')

In [54]: b is c
Out[54]: True

您可以在想要 紧固字典查找 时使用此功能,也可以在以下情况下使用此功能:您应该在代码中经常使用特定的字符串对象.

You can use this function either when you want to fasten the dictionary lookups or when you're ought to use a particular string object frequently in your code.

您不应与 string interning 混淆的另一点是,当您执行a == b时,您将创建对同一对象的两个引用,这对于那些具有相同id的关键字来说是显而易见的.

Another point that you should not confuse with string interning is that when you do a == b you're creating two references to the same object which is obvious for those keywords to have same id.

关于标点符号,似乎如果它们是一个字符,则它们的长度大于一会被中断;如果长度大于一,则不会被缓存.正如评论中提到的那样,原因之一可能是因为关键字和字典键不太可能包含标点符号.

Regarding punctuations, it seems that if they are one character they get interned if their length is more than one.If the length is more than one they won't get cached. As mentioned in comments, one reason for that might be because it's less likely for keywords and dictionary keys to have punctuations in them.

In [28]: a = ','

In [29]: ',' is a
Out[29]: True

In [30]: a = 'abc,'

In [31]: 'abc,' is a
Out[31]: False

In [34]: a = ',,'

In [35]: ',,' is a
Out[35]: False

# Or

In [36]: a = '^'

In [37]: '^' is a
Out[37]: True

In [38]: a = '^%'

In [39]: '^%' is a
Out[39]: False

但是这些仍然只是您在代码中不能依赖的一些推测.

But still these are just some speculations that you cannot rely on in you codes.

这篇关于python是带字符串的操作员行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆