'is'运算符使用非缓存整数意外行为 [英] 'is' operator behaves unexpectedly with non-cached integers

查看:152
本文介绍了'is'运算符使用非缓存整数意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在玩Python解释器时,我偶然发现了这个有关 运算符的冲突案例:



如果评估发生在函数中,则返回 True ,如果在外部完成,则返回 False

 >>> def func():
... a = 1000
... b = 1000
...返回a是b
...
>> ;> a = 1000
>>> b = 1000
>>> a是b,func()
(False,True)

由于运算符为所涉及的对象评估 id(),这意味着 a b 在函数 func中声明时指向相同的 int 实例但相反,它们指向外面的不同对象。



为什么会这样?






注意:我知道身份()和相等( == )之间的区别)如了解Python的是所述的操作。运营商。另外,我也知道python对范围 [ - 5,256] 中整数执行的缓存,如"是"运算符在整数上出乎意料地行为



这里的不是这种情况,因为数字超出了这个范围而且我做想要评估身份和平等。

解决方案

tl; dr:



参考手册指出:


块是一段Python程序文本,作为一个单元执行。
以下是块:模块,函数体和类定义。
每个以交互方式输入的命令都是一个块。


这就是为什么,在这种情况下一个函数,你有一个单个代码块,其中包含一个单个对象,用于数字文字
1000 ,所以 id(a)== id(b)将产生 True



在第二种情况下,你有两个不同的代码对象,每个代码对象都有自己不同的文字对象 1000 所以 id(a)!= id(b)



请注意,此行为不会以<$显示c $ c> int 只有文字,你会得到类似的结果,例如 float 文字(参见其他)。



当然,比较对象(显式除外测试除外)应始终使用相等操作进行或 ==



此处陈述的所有内容都适用于最流行的Python CPython实现。其他实现可能会有所不同,因此在使用它们时不应进行任何假设。






更长的答案:



为了获得更清晰的视图并另外验证这个看似奇怪的行为,我们可以直接在 代码 对象每个案例都使用 dis 模块。



对于函数 func



除了所有其他属性外,函数对象还有一个 __ code __ 属性,允许您查看该函数的已编译字节码。使用 dis.code_info 我们可以获得给定函数的代码对象中所有存储属性的漂亮视图:

 > ;>> print(dis.code_info(func))
名称:func
文件名:< stdin>
参数计数:0
Kw-only参数:0
当地人数:2
筹码大小:2
标志:优化,NEWLOCALS,NOFREE
常数:
0:无
1:1000
变量名称:
0:a
1:b

我们只对函数 func 常量条目感兴趣C $ C>。在其中,我们可以看到我们有两个值,(始终存在)和 1000 。我们只有一个单个 int实例,表示常量 1000 。这是在调用函数时将分配 a b 的值。



通过 func .__ code __。co_consts [1] 轻松访问此值,以及查看我们的另一种方式 a是b 函数中的评估如下:

 > ;>> id(func .__ code __。co_consts [1])== id(func .__ code __。co_consts [1])$ ​​b $ b  

当然,这将评估为 True ,因为我们指的是同一个对象。



对于每个交互式命令:



如前所述,每个交互式命令都被解释为单个代码块:独立解析,编译和评估。 / p>

我们可以通过 编译 内置:

 >>> com1 = compile(a = 1000,filename =,mode =single)
>>> com2 = compile(b = 1000,filename =,mode =single)

对于每个赋值语句,我们将得到一个类似的代码对象,如下所示:

 >>> print(dis.code_info(com1))
名称:< module>
文件名:
参数计数:0
Kw-only参数:0
本地人数:0
筹码大小:1
标志:NOFREE
常数:
0:1000
1:无
名称:
0:a

com2 的相同命令看起来相同,但有根本区别:每个代码对象 com1 com2 具有不同的int实例,表示文字 1000 。这就是为什么,在这种情况下,当我们通过 co_consts 参数执行 a b 时,我们实际得到:

 >>> id(com1.co_consts [0])== id(com2.co_consts [0])
False

这与我们实际得到的结果一致。



不同的代码对象,不同的内容。






注意:我对源代码中究竟是如何发生这种情况感到有些好奇,在深入了解之后我相信我终于找到了它。



在编译阶段, co_consts 属性由字典对象表示。在 compile.c 我们实际上可以看到初始化:

  / *简体中的片段* / 

u-> u_lineno = 0;
u-> u_col_offset = 0;
u-> u_lineno_set = 0;
u-> u_consts = PyDict_New();

/ *片段为简洁* /

在编译期间,我们会检查已存在的常量。有关详细信息,请参阅 @Raymond Hettinger的回答



< hr>

警告:




  • 链式语句将评估为 True



    现在应该更清楚为什么以下评估为 True

     >>> a = 1000; b = 1000; 
    >>> a是b

    在这种情况下,通过将两个赋值命令链接在一起,我们告诉解释器编译这些的一起即可。与函数对象的情况一样,只会创建文字 1000 的一个对象,从而产生 True 值评估时。


  • 模块级别的执行再次产生 True



    如前所述,参考手册指出:


    ......以下是块:一个模块 ...


    所以同样的前提适用:我们将有一个代码对象(用于模块)因此,为每个不同的文字存储单个值。


  • 相同 > mutable 对象:



    这意味着除非我们显式初始化为同一个可变对象(例如a = b = []),否则对象永远不会相等,例如:

      a = []; b = [] 
    a是b#始终返回false

    再次,在文档,这是指定的:


    a = 1之后; b = 1,a和b可能会或可能不会引用具有值1的同一对象,具体取决于实现,但在c = []之后; d = [],c和d保证引用两个不同的,唯一的,新创建的空列表。




When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:

If the evaluation takes place in the function it returns True, if it is done outside it returns False.

>>> def func():
...     a = 1000
...     b = 1000
...     return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)

Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.

Why is this so?


Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.

This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.

解决方案

tl;dr:

As the reference manual states:

A block is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition. Each command typed interactively is a block.

This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal 1000, so id(a) == id(b) will yield True.

In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).

Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).

Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.

Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.


Longer Answer:

To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.

For the function func:

Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:

>>> print(dis.code_info(func))
Name:              func
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  2
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 1000
Variable names:
   0: a
   1: b

We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.

Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:

>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1]) 

Which, of course, will evaluate to True because we're referring to the same object.

For each interactive command:

As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.

We can get the code objects for each command via the compile built-in:

>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")

For each assignment statement, we will get a similar looking code object which looks like the following:

>>> print(dis.code_info(com1))
Name:              <module>
Filename:          
Argument count:    0
Kw-only arguments: 0
Number of locals:  0
Stack size:        1
Flags:             NOFREE
Constants:
   0: 1000
   1: None
Names:
   0: a

The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:

>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False

Which agrees with what we actually got.

Different code objects, different contents.


Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.

During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:

/* snippet for brevity */

u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();  

/* snippet for brevity */

During compilation this is checked for already existing constants. See @Raymond Hettinger's answer below for a bit more on this.


Caveats:

  • Chained statements will evaluate to an identity check of True

    It should be more clear now why exactly the following evaluates to True:

    >>> a = 1000; b = 1000;
    >>> a is b
    

    In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.

  • Execution on a module level yields True again:

    As previously mentioned, the reference manual states that:

    ... The following are blocks: a module ...

    So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.

  • The same doesn't apply for mutable objects:

    Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:

    a = []; b = []
    a is b  # always returns false
    

    Again, in the documentation, this is specified:

    after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.

这篇关于'is'运算符使用非缓存整数意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆