Cython模块中的加载与链接 [英] Loading vs linking in Cython modules

查看:79
本文介绍了Cython模块中的加载与链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在探索Cython编译步骤时,我发现我需要在setup.py中显式链接C库,如数学。但是,对于numpy来说不需要这样的步骤。为什么这样?是否通过常规的python导入机制导入numpy?如果是这种情况,我们不需要在Cython中显式链接任何扩展模块吗?



我试图翻阅官方文档,但不幸的是没有关于何时需要显式链接以及何时将自动处理显式链接的解释。

解决方案

调用 cdef 功能或多或少地相当于跳转到内存中的一个地址,即应该从中读取/执行命令的地址。问题是如何提供此地址。在某些情况下,我们需要考虑:



A。内联函数



这些函数的代码是内联的,或者函数的定义在同一转换单元中,因此链接器在链接时(甚至在编译时甚至是编译器)都知道地址,无需其他库。 / p>

一个示例是仅标头的库。



后果:仅包含路径( s)应该在 setup.py 中提供。



B。静态链接



我们需要的定义/功能在另一个翻译单元/库中-跳转的目标地址是在链接时计算的,以后不能再更改。



示例是添加到扩展定义中的其他c / cpp文件或静态库。



结果:应该将静态库添加到 setup.py ,即库-路径和库名称以及包含路径。



C。动态链接



共享库/共享库中提供了必要的功能。跳转到的地址是在运行时从加载程序中计算得出的,可以在程序启动时通过交换已加载的共享对象来替换。



例如stdlibc ++(通常由g ++)或libm,gcc不会自动将其添加到链接器命令中。



后果:动态库应添加到 setup.py ,即库路径和库名称,也许r-path +包含路径。必须在运行时提供共享的对象/ dll。可以在此 SO-post 中找到有关使用动态库的有关Cython / Python的更多信息(可能不希望知道)。 。



D。。通过指针调用



仅当我们需要链接器时通过名称来调用函数。如果通过函数指针调用它,则不需要链接器/加载器,因为该函数的地址是已知的-函数指针中的值。



示例:Cython生成的模块使用此机制来访问通过 pxd 文件导出的cdef函数。它会创建一个功能指针的数据结构(在模块本身中存储为变量 __ pyx_capi __ ),一旦通过 ldopen (或Windows的任何等效版本)。字典中的查找仅在模块加载且函数地址被缓存时发生一次,因此在运行时的调用几乎没有开销。



我们可以检查它,例如通过

 #foo.pyx:
cdef void doit():
打印( doit)
#foo.pxd
cdef void doit()

>> cythonize -3 -i foo.pyx
>> python -c import foo; print(foo .__ pyx_capi__)
{'doit':< capsule object void(void) at 0x7f7b10bb16c0>}

现在,从另一个模块调用 cdef 函数只是跳转到相应的地址。



后果:我们需要导入所需的功能。






Numpy稍微复杂一点,因为它使用了 A D 的复杂组合,以便将符号的分辨率推迟到运行时,因此在链接时(但在运行时!)需要共享对象/ dll。



numpy -pxd文件可以直接使用,因为它们是内联(甚至只是定义)的,例如 PyArray_NDIM ,基本上是 ndarraytypes.h 。这就是可以轻松使用cython的ndarray的原因。



其他功能(基本上所有内容都来自 ndarrayobject.h ),如果不调用 np.import_array()在初始化步骤中,例如 PyArray_FromAny 。为什么?



答案在标题 __ multiarray_api.h 中,该标题为已包含在 ndarrayobject.h 中,但在 git-repository 因为它是生成,在其中安装 PyArray_FromAny 的定义查找:

  ... 
static void ** PyArray_API = NULL; //通常...
...
#define PyArray_CheckFromAny \
(*(PyObject *(*)(PyObject *,PyArray_Descr *,int,int,int,PyObject *) )\
PyArray_API [108])
...

PyArray_CheckFromAny 不是函数的名称,而是定义为保存在 PyArray_API 中的函数指针,该指针未初始化(即第一次加载模块时为 NULL )!顺便说一句,还有一个名为 PyArray_CheckFromAny ,这实际上是函数指针指向的内容-并且由于公共版本是定义,因此链接时不会发生名称冲突...



难题的最后一部分-函数 _import_array (或多或少在 np.import_array )是一个内联函数(大小写为 A ),因此只需要include路径就可以使用它。



_import_array 使用与Cython的 __ pyx_capi __ 类似的方法来获取函数指针:该字段称为 _ARRAY_API 并可以进行检查vi a:

 >> import numpy.core._multiarray_umath as macore 
>> macore._ARRAY_API
< capsule object NULL at 0x7f17d85f3810>

有关如何初始化 PyArray_API 的更多信息可以在我的 SO-answer 中找到。



但是,当使用 numpy /math.pxd ,则必须静态链接numpy的数学库(例如,参见这样的问题)。


While exploring Cython compile steps, I found I need to link C libraries like math explicitly in setup.py. However, such step was not needed for numpy. Why so? Is numpy being imported through usual python import mechanism? If that is the case, we need not explicitly link any extension module in Cython?

I tried to rummage through the official documentation, but unfortunately there was no explanation as to when an explicit linking is required and when it will be dealt automatically.

解决方案

Call of a cdef-function corresponds more or less just to a jump to an address in the memory - the one from which the command should be read/executed. The question is how this address is provided. There are some cases we need to consider:

A. inline functions

The code of those functions is either inlined or the definition of the function is in the same translation unit, thus the address is known to the linker at the link time (or even compiler at compile-time) - no need for additional libraries.

An example are header-only libraries.

Consequences: Only include path(s) should be provided in setup.py.

B. static linking

The definition/functionality we need is in another translation unit/library - the target-address of the jump is calculated at the link-time and cannot be changed anymore afterwards.

An example are additional c/cpp-files or static libraries which are added to extension-definition.

Consequences: Static library should be added to setup.py, i.e. library-path and library name along with include paths.

C. dynamic linking

The necessary functionality is provided in a shared object/dll. The address to jump to is calculated during the runtime from loader and can be replaced at program start by exchanging the loaded shared objects.

An example are stdlibc++ (usually added automatically by g++) or libm, which is not automatically added to linker command by gcc.

Consequences: Dynamic library should be added to setup.py, i.e. library-path and library name, maybe r-path + include paths. Shared object/dll must be provided at the run time. More (than one probably would like to know) information about Cython/Python using dynamic libraries can be found in this SO-post.

D. Calling via a pointer

Linker is needed only when we call a function via its name. If we call it via a function-pointer, we don't need a linker/loader because the address of the function is already known - the value in the function pointer.

Example: Cython-generated modules uses this machinery to enable access to its cdef-functions exported through pxd-file. It creates a data structure (which is stored as variable __pyx_capi__ in the module itself) of function-pointers, which is filled by the loader once the so/dll is loaded via ldopen (or whatever Windows' equivalent). The lookup in the dictionary happens only once when the module is loaded and the addresses of functions are cached, so the calls during the run time have almost no overhead.

We can inspect it, for example via

#foo.pyx:
cdef void doit():
    print("doit")
#foo.pxd
cdef void doit()

>>> cythonize -3 -i foo.pyx
>>> python -c "import foo; print(foo.__pyx_capi__)" 
{'doit': <capsule object "void (void)" at 0x7f7b10bb16c0>}

Now, calling a cdef function from another module is just jumping to the corresponding address.

Consequences: We need to cimport the needed funcionality.


Numpy is a little bit more complicated as it uses a sophisticated combination of A and D in order to postpone the resolution of symbols until the run time, thus not needing shared-object/dlls at link time (but at run time!).

Some functionality in numpy-pxd file can be directly used because they are inlined (or even just defines), for example PyArray_NDIM, basically everything from ndarraytypes.h. This is the reason one can use cython's ndarrays without much ado.

Other functionality (basically everything from ndarrayobject.h) cannot be accessed without calling np.import_array() in an initialization step, for example PyArray_FromAny. Why?

The answer is in the header __multiarray_api.h which is included in ndarrayobject.h, but cannot be found in the git-repository as it is generated during the installation, where the definition of PyArray_FromAny can be looked up:

...
static void **PyArray_API=NULL; //usually...
...
#define PyArray_CheckFromAny \
        (*(PyObject * (*)(PyObject *, PyArray_Descr *, int, int, int, PyObject *)) \
         PyArray_API[108])
...

PyArray_CheckFromAny isn't a name of a function, but a define fo a function pointer saved in PyArray_API, which is not initialized (i.e. is NULL), when module is first loaded! Btw, there is also a (private) function called PyArray_CheckFromAny, which is what the function pointer actually points to - and because the public version is a define there is no name collision when linked...

The last piece of the puzzle - the function _import_array (more or less the working horse behind np.import_array) is an inline function (case A), so only include path is needed, to be able to use it.

_import_array uses a similar approach to Cython's __pyx_capi__ to get the function pointers: The field is called _ARRAY_API and can be inspected via:

>>> import numpy.core._multiarray_umath as macore
>>> macore._ARRAY_API
<capsule object NULL at 0x7f17d85f3810>

More info about how PyArray_API can be initialized can be found in this SO-answer of mine.

However, when using functionality from numpy/math.pxd, one has to staticly link numpy's math-library (see for example this SO-question).

这篇关于Cython模块中的加载与链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆