is运算符对非缓存整数的行为异常 [英] The `is` operator behaves unexpectedly with non-cached integers

查看:66
本文介绍了is运算符对非缓存整数的行为异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用Python解释器时,我偶然发现了一个与 is 运算符有关的冲突案例:

When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:

如果在函数中进行求值,则返回 True ,如果在函数外部执行,则返回 False .

If the evaluation takes place in the function it returns True, if it is done outside it returns False.

>>> def func():
...     a = 1000
...     b = 1000
...     return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)

由于 is 运算符为涉及的对象评估 id(),因此这意味着 a b 在函数 func 内部声明时,指向相同的 int 实例,但是相反,当它们在函数外部时,它们指向另一个对象.

Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.

为什么会这样?

注意:我知道身份( is )和相等( == )操作之间的区别,如了解Python的是"运算符.另外,我还知道python正在对 =-5,256] 中范围内的整数执行缓存,如.

这里不是这种情况,因为数字不在该范围内,并且我愿意要评估身份和平等.

Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.

This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.

推荐答案

tl; dr:

参考手册指出:

块是作为单元执行的一段Python程序文本.以下是块:模块,函数体和类定义.以交互方式键入的每个命令都是一个块.

这就是为什么在使用函数的情况下,您有一个单个代码块,其中包含一个用于数字文字的单个对象 1000 ,因此 id(a)== id(b)将产生 True .

This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal 1000, so id(a) == id(b) will yield True.

在第二种情况下,您有两个不同的代码对象,每个对象针对文字 1000 具有各自不同的对象,因此 id(a)!= id(b).

In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).

请注意,此行为不仅仅在 int 文字中体现,您将获得类似的结果,例如, float 文字(请参见

Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).

当然,比较对象(明确的 is None 测试除外)应始终使用相等运算符 == not .

Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.

此处所述的所有内容均适用于最流行的Python CPython实现.其他实现可能会有所不同,因此在使用它们时不做任何假设.

要获得更清晰的视图并进一步验证这种看似奇怪的行为,我们可以直接在 code 对象.org/3.5/library/dis.html"rel =" nofollow noreferrer> dis 模块.

To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.

对于功能 func :

For the function func:

除所有其他属性外,函数对象还具有 __ code __ 属性,该属性使您可以查看该函数的已编译字节码.使用 dis.code_info 对于给定的函数,可以很好地查看代码对象中所有存储的属性:

Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:

>>> print(dis.code_info(func))
Name:              func
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  2
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 1000
Variable names:
   0: a
   1: b

我们只对功能 func Constants 条目感兴趣.在其中,我们可以看到有两个值: None (始终存在)和 1000 .我们只有一个单个 int实例,该实例表示常量 1000 .这是调用函数时将为 a b 分配的值.

We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.

通过 func .__ code __.co_consts [1] 可以轻松访问此值,因此,查看我们 a的另一种方法是在函数中进行b 评估,就像这样:

Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:

>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1]) 

因为我们指的是同一个对象,所以当然会得出 True .

Which, of course, will evaluate to True because we're referring to the same object.

对于每个交互式命令:

如前所述,每个交互式命令都被解释为一个代码块:分别进行分析,编译和评估.

As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.

我们可以通过 compile获取每个命令的代码对象 内置:

We can get the code objects for each command via the compile built-in:

>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")

对于每个赋值语句,我们将获得一个外观类似的代码对象,如下所示:

For each assignment statement, we will get a similar looking code object which looks like the following:

>>> print(dis.code_info(com1))
Name:              <module>
Filename:          
Argument count:    0
Kw-only arguments: 0
Number of locals:  0
Stack size:        1
Flags:             NOFREE
Constants:
   0: 1000
   1: None
Names:
   0: a

com2 的相同命令看起来相同,但有根本区别:每个代码对象 com1 com2 具有表示文字 1000 的不同int实例.这就是为什么在这种情况下,当我们通过 co_consts 参数执行 a是b 时,实际上得到了:

The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:

>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False

与我们实际得到的一致.

Which agrees with what we actually got.

不同的代码对象,不同的内容.

注意:我对这种情况在源代码中的发生方式有些好奇,并且在深入研究之后,我相信我终于找到了它.

Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.

在编译阶段, co_consts 属性由字典对象表示.在 compile.c 中,可以实际看到初始化:

During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:

/* snippet for brevity */

u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();  

/* snippet for brevity */

在编译期间,将检查已存在的常量.有关更多信息,请参见下面的 @Raymond Hettinger的答案.

During compilation this is checked for already existing constants. See @Raymond Hettinger's answer below for a bit more on this.

  • 链式语句的身份检查结果为 True

现在应该更加清楚为什么以下内容准确地评估为 True :

It should be more clear now why exactly the following evaluates to True:

 >>> a = 1000; b = 1000;
 >>> a is b

在这种情况下,通过将两个分配命令链接在一起,我们告诉解释器将这些一起编译.与功能对象的情况一样,将只为文字 1000 创建一个对象,并在评估时产生 True 值.

In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.

在模块级别执行将再次产生 True :

Execution on a module level yields True again:

如前所述,参考手册指出:

As previously mentioned, the reference manual states that:

...以下是块:模块 ...

因此,同样的前提也适用:我们将有一个代码对象(用于模块),因此,将为每个不同的文字存储单个值.

So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.

相同的适用于可变对象:

这意味着除非我们显式初始化为相同的可变对象(例如,使用 a = b = [] ),否则对象的身份永远不会相等,例如:

Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:

    a = []; b = []
    a is b  # always evaluates to False

再次在文档中,这是指定的:

Again, in the documentation, this is specified:

a = 1之后;b = 1,取决于实现,a和b可以或可以不使用值1引用同一对象,但是在c = []之后;d = [],保证c和d引用两个不同的,唯一的,新创建的空列表.

after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.

这篇关于is运算符对非缓存整数的行为异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆