解释器维护的整数缓存是什么? [英] What's with the integer cache maintained by the interpreter?

查看:93
本文介绍了解释器维护的整数缓存是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

深入研究Python的源代码后,我发现它维护了一个PyInt_Object数组,范围从int(-5)int(256)(@ src/Objects/intobject.c)

After dive into Python's source code, I find out that it maintains an array of PyInt_Objects ranging from int(-5) to int(256) (@src/Objects/intobject.c)

一个小实验证明了这一点

A little experiment proves it:

>>> a = 1
>>> b = 1
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False

但是,如果我在py文件中一起运行这些代码(或使用分号将它们结合在一起),结果将有所不同:

But if I run those code together in a py file (or join them with semi-colons), the result is different:

>>> a = 257; b = 257; a is b
True

我很好奇为什么它们仍然是同一对象,所以我深入研究了语法树和编译器,提出了下面列出的调用层次结构:

I'm curious why they are still the same object, so I digg deeper into the syntax tree and compiler, I came up with a calling hierarchy listed below:

PyRun_FileExFlags() 
    mod = PyParser_ASTFromFile() 
        node *n = PyParser_ParseFileFlagsEx() //source to cst
            parsetoke() 
                ps = PyParser_New() 
                for (;;)
                    PyTokenizer_Get() 
                    PyParser_AddToken(ps, ...)
        mod = PyAST_FromNode(n, ...)  //cst to ast
    run_mod(mod, ...)
        co = PyAST_Compile(mod, ...) //ast to CFG
            PyFuture_FromAST()
            PySymtable_Build()
            co = compiler_mod()
        PyEval_EvalCode(co, ...)
            PyEval_EvalCodeEx()

然后我在PyInt_FromLong中以及PyAST_FromNode之前/之后添加了一些调试代码,并执行了一个test.py:

Then I added some debug code in PyInt_FromLong and before/after PyAST_FromNode, and executed a test.py:

a = 257
b = 257
print "id(a) = %d, id(b) = %d" % (id(a), id(b))

输出如下:

DEBUG: before PyAST_FromNode
name = a
ival = 257, id = 176046536
name = b
ival = 257, id = 176046752
name = a
name = b
DEBUG: after PyAST_FromNode
run_mod
PyAST_Compile ok
id(a) = 176046536, id(b) = 176046536
Eval ok

这意味着在从cstast的转换过程中,创建了两个不同的PyInt_Object(实际上是在ast_for_atom()函数中执行的),但是它们后来被合并了.

It means that during the cst to ast transform, two different PyInt_Objects are created (actually it's performed in the ast_for_atom() function), but they are later merged.

我发现很难理解PyAST_CompilePyEval_EvalCode中的来源,所以我在这里寻求帮助,如果有人提供提示,我将不胜感激.

I find it hard to comprehend the source in PyAST_Compile and PyEval_EvalCode, so I'm here to ask for help, I'll be appreciative if some one gives a hint?

推荐答案

Python缓存

Python caches integers in the range [-5, 256], so it is expected that integers in that range are also identical.

您看到的是Python编译器在相同文本的一部分时优化了相同文字.

What you see is the Python compiler optimizing identical literals when part of the same text.

在Python shell中键入时,每行都是完全不同的语句,在不同的时刻进行了解析,因此:

When typing in the Python shell each line is a completely different statement, parsed in a different moment, thus:

>>> a = 257
>>> b = 257
>>> a is b
False

但是,如果您将相同的代码放入文件中:

But if you put the same code into a file:

$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True

每当解析器有机会分析在哪里使用文字时,例如在交互式解释器中定义函数时,就会发生这种情况:

This happens whenever the parser has a chance to analyze where the literals are used, for example when defining a function in the interactive interpreter:

>>> def test():
...     a = 257
...     b = 257
...     print a is b
... 
>>> dis.dis(test)
  2           0 LOAD_CONST               1 (257)
              3 STORE_FAST               0 (a)

  3           6 LOAD_CONST               1 (257)
              9 STORE_FAST               1 (b)

  4          12 LOAD_FAST                0 (a)
             15 LOAD_FAST                1 (b)
             18 COMPARE_OP               8 (is)
             21 PRINT_ITEM          
             22 PRINT_NEWLINE       
             23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        
>>> test()
True
>>> test.func_code.co_consts
(None, 257)

请注意,编译后的代码如何包含257的单个常量.

Note how the compiled code contains a single constant for the 257.

总而言之,Python字节码编译器无法执行大规模优化(如静态类型语言),但是它的功能超出您的想象.其中之一是分析文字的用法,避免重复.

In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like static types languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.

请注意,这与缓存无关,因为它也适用于没有缓存的浮点数:

Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:

>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True

对于更复杂的文字(如元组),它不起作用":

For more complex literals, like tuples, it "doesn't work":

>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False

但是元组中的文字是共享的:

But the literals inside the tuple are shared:

>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True


关于为什么看到创建两个PyInt_Object的原因,我会猜测,这样做是为了避免字面比较.例如,数字257可以用多个文字表示:


Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:

>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257

解析器有两个选择:

  • 在创建整数之前,将文字转换为某个通用基数,然后查看文字是否等效.然后创建一个整数对象.
  • 创建整数对象,然后查看它们是否相等.如果是,则仅保留一个值并将其分配给所有文字,否则,您已经可以分配整数.

Python解析器可能使用第二种方法,该方法避免了重写转换代码,并且更易于扩展(例如,它也可以与float一起使用).

Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).

读取Python/ast.c文件时,解析所有数字的函数为parsenumber,该函数调用PyOS_strtoul以获取整数值(对于整数),并最终调用PyLong_FromString:

Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:

    x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
    if (x < 0 && errno == 0) {
        return PyLong_FromString((char *)s,
                                 (char **)0,
                                 0);
    }

正如您在这里看到的那样,解析器不会检查它们是否已经找到具有给定值的整数,因此这解释了为什么您看到创建了两个int对象, 这也意味着我的猜测是正确的:解析器首先创建常数,然后才优化字节码以将相同的对象用于相等的常数.

As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created, and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.

执行此检查的代码必须位于Python/compile.cPython/peephole.c中,因为这些文件将AST转换为字节码.

The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.

尤其是compiler_add_o函数似乎是执行此功能的函数. compiler_lambda中有此评论:

In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:

/* Make None the first constant, so the lambda can't have a
   docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
    return 0;

因此,似乎compiler_add_o用于为函数/​​lambdas等插入常量. compiler_add_o函数将这些常数存储到dict对象中,随后立即将相等的常数放入同一插槽中,从而在最终字节码中产生一个常数.

So it seems like compiler_add_o is used to insert constants for functions/lambdas etc. The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.

这篇关于解释器维护的整数缓存是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆