Python如何以及何时确定变量的数据类型? [英] How and when does Python determine the data type of a variable?

查看:42
本文介绍了Python如何以及何时确定变量的数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚Python 3(使用CPython作为解释器)如何执行其程序.我发现这些步骤是:

  1. CPython编译器将Python源代码(.py文件)编译为Python字节码(.pyc)文件.在导入任何模块的情况下,将保存.pyc文件,而在运行一个main.py Python脚本的情况下,将不保存它们.

  2. Python虚拟机将字节码解释为特定于硬件的机器码.

在这里找到了一个很好的答案 https://stackoverflow.com/a/1732383/8640077 表示Python虚拟机与JVM相比,运行其字节码需要更长的时间,因为Java字节码包含有关数据类型的信息,而Python虚拟机则逐行解释行并必须确定数据类型.

我的问题是Python虚拟机如何确定数据类型,它是在解释机器代码时还是在单独的过程中发生的(例如,将产生另一个中间代码)?

解决方案

CPython的动态,运行时分派(与Java的静态,编译时分派相比)只是原因之一,为什么Java的速度比纯CPython:Java中存在jit编译,不同的垃圾回收策略,本机类型(如 int double 与CPython中的不可变数据结构等)的存在./p>

我以前的表面实验显示,动态调度仅负责运行的大约30%-您不能以此解释一些数量级因素的速度差异.

为了使这个答案不那么抽象,让我们看一个例子:

  def add(x,y):返回x + y 

查看字节码:

  import disdis.dis(添加) 

给出:

  2 0 LOAD_FAST 0(x)2 LOAD_FAST 1(y)4 BINARY_ADD6 RETURN_VALUE 

我们可以在字节码的级别上看到 x y 是整数,浮点数还是其他东西没有区别-解释器不在乎.

在Java中情况完全不同:

  int add(int x,int y){返回x + y;} 

  float add(float x,float y){返回x + y;} 

将导致完全不同的操作码,并且在编译时会发生调用调度-根据编译时已知的静态类型选择正确的版本.

通常CPython解释器不必知道参数的确切类型:内部有一个基本的类/接口"(显然C中没有类,因此它被称为协议",但是对于某些人谁知道C ++/Java接口"可能是正确的思维模型),就可以从中得出所有其他类".此基本类"称为 PyObject 此处是协议的说明..因此,只要函数是该协议/接口的一部分,CPython解释器就可以调用它,而无需知道确切的类型,并且该调用将被分派到正确的实现中(很像C ++中的虚拟"函数).

在纯Python方面,似乎变量没有类型:

  a = 1a ="1" 

但是,内部 a 具有类型-它是 PyObject * ,并且此引用可以绑定到整数( 1 )并绑定到一个unicode字符串("1" )-因为它们都从 PyObject 继承.

对于上面的示例,CPython解释器有时会尝试找出正确的引用类型-当它看到 BINARY_ADD -opcode时, SO-answer ),否则将工作分派到 PyNumber -协议.

很明显,在创建对象时,解释器还必须知道确切的类型,例如 a ="1" a = 1 不同的类"被使用-但正如我们所看到的,它并不是唯一的地方.

因此,解释器会在运行时干预类型,但是大多数时候不必这样做-可以通过动态调度来达到目标​​.

I was trying to figure out exactly how Python 3 (using CPython as an interpreter) executes its program. I found out that the steps are:

  1. Compilation of Python source code (.py file) by CPython compilator to Python bytecode (.pyc) file. In the case of importing any modules the .pyc files are saved, in the case of one main.py Python script running they are not saved.

  2. Python Virtual Machine interpretation of the bytecode into the hardware specific Machine Code.

A great answer found here https://stackoverflow.com/a/1732383/8640077 says that Python Virtual Machine takes longer to run its bytecode comparing to JVM because the java bytecode contains the information about data types, while Python Virtual Machine interprets lines one by one and has to determine the data types.

My question is how does Python Virtual Machine determine the data type and does it happen during the interpretation to Machine code or during a separate process (which e.g. would produce another intermediate code)?

解决方案

The dynamic, run-time dispatch of CPython (compared to static, compile-time dispatch of Java) is only one of the reasons, why Java is faster than pure CPython: there are jit-compilation in Java, different garbage collection strategies, presence of native types like int, double vs. immutable data structures in CPython and so on.

My earlier superficial experiments have shown, that the dynamical dispatch is only responsible for about 30% of running - you cannot explain speed differences of some factors of magnitude with that.

To make this answer less abstract, let's take a look at an example:

def add(x,y):
   return x+y

Looking at the bytecode:

import dis
dis.dis(add)

which gives:

2         0 LOAD_FAST                0 (x)
          2 LOAD_FAST                1 (y)
          4 BINARY_ADD
          6 RETURN_VALUE

We can see on the level of bytecode there is no difference whether x and y are integers or floats or something else - the interpreter doesn't care.

The situation is completely different in Java:

int add(int x, int y) {return x+y;}

and

float add(float x, float y) {return x+y;}

would result in completely different opcodes and the call-dispatch would happen at compile time - the right version is picked depending on the static types which are known at the compile time.

Pretty often CPython-interpreter doesn't have to know the exact type of arguments: Internally there is a base "class/interface" (obviously there are no classes in C, so it is called "protocol", but for somebody who knows C++/Java "interface" is probably the right mental model), from which all other "classes" are derived. This base "class" is called PyObject and here is the description of its protocol.. So as long as the function is a part of this protocol/interface CPython interpreter can call it, without knowing the exact type and the call will be dispatched to the right implementation (a lot like "virtual" functions in C++).

On the pure Python side, it seems as if variables don't have types:

a=1
a="1"

however, internally a has a type - it is PyObject* and this reference can be bound to an integer (1) and to an unicode-string ("1") - because they both "inherit" from PyObject.

From time to time the CPython interpreter tries to find out the right type of the reference, also for the above example - when it sees BINARY_ADD-opcode, the following C-code is executed:

    case TARGET(BINARY_ADD): {
        PyObject *right = POP();
        PyObject *left = TOP();
        PyObject *sum;
        ...
        if (PyUnicode_CheckExact(left) &&
                 PyUnicode_CheckExact(right)) {
            sum = unicode_concatenate(left, right, f, next_instr);
            /* unicode_concatenate consumed the ref to left */
        }
        else {
            sum = PyNumber_Add(left, right);
            Py_DECREF(left);
        }
        Py_DECREF(right);
        SET_TOP(sum);
        if (sum == NULL)
            goto error;
        DISPATCH();
    }

Here the interpreter queries, whether both objects are unicode strings and if this is the case a special method (maybe more efficient, as matter of fact it tries to change the immutable unicode-object in-place, see this SO-answer) is used, otherwise the work is dispatched to PyNumber-protocol.

Obviously, the interpreter also has to know the exact type when an object is created, for example for a="1" or a=1 different "classes" are used - but as we have seen it is not the only one place.

So the interpreter interfers the types during the run-time, but most of the time it doesn't have to do it - the goal can be reached via dynamic dispatch.

这篇关于Python如何以及何时确定变量的数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆