导入软件包时会发生什么? [英] What happens when you import a package?

查看:103
本文介绍了导入软件包时会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了效率起见,我试图弄清楚python如何使用其对象堆(以及名称空间系统,但是或多或少是很清楚的).因此,基本上,我试图了解何时将对象加载到堆中,其中有多少个对象,它们的生存时间等等.

For efficiency's sake I am trying to figure out how python works with its heap of objects (and system of namespaces, but it is more or less clear). So, basically, I am trying to understand when objects are loaded into the heap, how many of them are there, how long they live etc.

我的问题是当我使用软件包并从中导入某些东西时:

from pypackage import pymodule

什么对象被加载到内存中(进入python解释器的对象堆)?更笼统地说:会发生什么? :)

what objects get loaded into the memory (into the object heap of the python interpreter)? And more generally: what happens? :)

我猜上面的例子做了类似的事情: 包pypackage的某些对象是在内存中创建的(其中包含有关包的一些信息,但不是太多),模块pymodule被加载到内存中,并且其引用是在本地名称空间中创建的. 这里重要的是:除非在内存中未明确声明pypackage(或其他对象)的其他模块,否则在内存中不会创建其他模块(或其他对象)和钩子(我不熟悉).最后,内存中唯一重要的事情是pymodule(即,导入模块时创建的所有对象).是这样吗?如果有人对此事澄清一点,我将不胜感激.也许您可以建议一些有用的文章? (文档涵盖了更具体的内容)

I guess the above example does something like: some object of the package pypackage was created in the memory (which contains some information about the package but not too much), the module pymodule was loaded into the memory and its reference was created in the local name space. The important thing here is: no other modules of the pypackage (or other objects) were created in the memory, unless it is stated explicitly (in the module itself, or somewhere in the package initialization tricks and hooks, which I am not familiar with). At the end the only one big thing in the memory is pymodule (i.e. all the objects that were created when the module was imported). Is it so? I would appreciate if someone clarified this matter a little bit. Maybe you could advice some useful article about it? (documentation covers more particular things)

对于相同的模块导入问题,我发现以下问题:

I have found the following to the same question about the modules import:

Python导入模块时,它首先检查模块注册表(sys.modules)以查看模块是否已导入.如果是这种情况,Python将按原样使用现有的模块对象.

When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.

否则,Python会执行以下操作:

Otherwise, Python does something like this:

  • 创建一个新的空模块对象(本质上是一个字典)
  • 在sys.modules词典中插入该模块对象
  • 加载模块代码对象(如有必要,请先编译模块)
  • 在新模块的命名空间中执行模块代码对象.代码分配的所有变量都可以通过模块对象获得.
  • Create a new, empty module object (this is essentially a dictionary)
  • Insert that module object in the sys.modules dictionary
  • Load the module code object (if necessary, compile the module first)
  • Execute the module code object in the new module’s namespace. All variables assigned by the code will be available via the module object.

对于相同的软件包解释也将不胜感激.

And would be grateful for the same kind of explanation about packages.

顺便说一下,对于软件包,模块名称奇怪地添加到了sys.modules中:

By the way, with packages a module name is added into the sys.modules oddly:

>>> import sys
>>> from pypacket import pymodule
>>> "pymodule" in sys.modules.keys()
False
>>> "pypacket" in sys.modules.keys()
True

关于同一件事,还有一个实际的问题.

And also there is a practical question concerning the same matter.

构建一组工具时,可能会在不同的流程和程序中使用它们.然后将它们放在模块中.我别无选择,只能加载一个完整的模块,即使我只想使用在那里声明的一个函数也是如此.如我所见,可以通过制作较小的模块并将其放入程序包中来减轻此问题的痛苦(如果在仅导入其中一个程序包时,该程序包未加载其所有模块).

When I build a set of tools, which might be used in different processes and programs. And I put them in modules. I have no choice but to load a full module even when all I want is to use only one function declared there. As I see one can make this problem less painful by making small modules and putting them into a package (if a package doesn't load all of its modules when you import only one of them).

是否有更好的方法在Python中制作此类库? (仅使用函数,这些函数在其模块内没有任何依赖关系.)使用C扩展是否可能?

Is there a better way to make such libraries in Python? (With the mere functions, which don't have any dependencies within their module.) Is it possible with C-extensions?

对不起,这么长时间的提问.

PS sorry for such a long question.

推荐答案

您在这里有几个不同的问题. .

You have a few different questions here. . .

导入软件包时,步骤顺序与导入模块时相同.唯一的区别是软件包的代码(即创建模块代码对象"的代码)是软件包的__init__.py的代码.

When you import a package, the sequence of steps is the same as when you import a module. The only difference is that the packages's code (i.e., the code that creates the "module code object") is the code of the package's __init__.py.

所以是的,除非__init__.py明确地这样做,否则不会加载程序包的子模块.如果您执行from package import module,则仅加载module,除非它当然会从软件包中导入其他模块.

So yes, the sub-modules of the package are not loaded unless the __init__.py does so explicitly. If you do from package import module, only module is loaded, unless of course it imports other modules from the package.

从包中导入模块时,名称就是添加到sys.modules的名称,它是限定名称",用于指定模块名称以及从中导入任何包的点分隔名称.因此,如果执行from package.subpackage import mod,则添加到sys.modules的是"package.subpackage.mod".

When you import a module from a package, the name is that is added to sys.modules is the "qualified name" that specifies the module name together with the dot-separated names of any packages you imported it from. So if you do from package.subpackage import mod, what is added to sys.modules is "package.subpackage.mod".

通常不大需要导入整个模块,而不仅仅是导入一个功能.您说它是痛苦的",但实际上几乎从来没有.

It is usually not a big concern to have to import the whole module instead of just one function. You say it is "painful" but in practice it almost never is.

如您所说,如果这些函数没有外部依赖关系,那么它们只是纯Python并且加载它们将不会花费很多时间.通常,如果导入模块需要很长时间,那是因为它会加载其他模块,这意味着它确实具有外部依赖性,因此您必须加载整个程序.

If, as you say, the functions have no external dependencies, then they are just pure Python and loading them will not take much time. Usually, if importing a module takes a long time, it's because it loads other modules, which means it does have external dependencies and you have to load the whole thing.

如果模块具有在模块导入时发生的昂贵操作(即它们是全局模块级代码,而不是在函数内部),但对于使用模块中的所有函数不是必不可少的,则可以您可以根据需要重新设计模块,以将加载推迟到以后.也就是说,如果您的模块执行以下操作:

If your module has expensive operations that happen on module import (i.e., they are global module-level code and not inside a function), but aren't essential for use of all functions in the module, then you could, if you like, redesign your module to defer that loading until later. That is, if your module does something like:

def simpleFunction():
    pass

# open files, read huge amounts of data, do slow stuff here

您可以将其更改为

def simpleFunction():
    pass

def loadData():
    # open files, read huge amounts of data, do slow stuff here

,然后告诉人们要加载数据时致电someModule.loadData()".或者,如您建议的那样,您可以将模块的昂贵部分放入包装中各自独立的模块中.

and then tell people "call someModule.loadData() when you want to load the data". Or, as you suggested, you could put the expensive parts of the module into their own separate module within a package.

我从来没有发现导入模块会对性能产生有意义的影响,除非模块已经足够大以至于可以合理地分解成较小的模块.制造成吨的每个都包含一个功能的微型模块,除了因必须跟踪所有这些文件而引起的维护麻烦之外,几乎不会为您带来任何好处.您是否真的有特定情况对您有所作为?

I've never found it to be the case that importing a module caused a meaningful performance impact unless the module was already large enough that it could reasonably be broken down into smaller modules. Making tons of tiny modules that each contain one function is unlikely to gain you anything except maintenance headaches from having to keep track of all those files. Do you actually have a specific situation where this makes a difference for you?

据我所知,关于您的最后一点,对C扩展模块和纯Python模块一样,采用全有或全无的加载策略.显然,就像使用Python模块一样,您可以将它们拆分成较小的扩展模块,但是如果不运行打包为该扩展模块一部分的其余代码,就无法执行from someExtensionModule import someFunction.

Also, regarding your last point, as far as I'm aware, the same all-or-nothing load strategy applies to C extension modules as for pure Python modules. Obviously, just like with Python modules, you could split things up into smaller extension modules, but you can't do from someExtensionModule import someFunction without also running the rest of the code that was packaged as part of that extension module.

这篇关于导入软件包时会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆