PyArray_Check使用Cython/C ++给出分段错误 [英] PyArray_Check gives Segmentation Fault with Cython/C++

查看:71
本文介绍了PyArray_Check使用Cython/C ++给出分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

提前谢谢大家.

我想知道使用#include所有numpy标头的正确方法是什么,以及使用Cython和C ++解析numpy数组的正确方法是什么.下面是尝试:

// cpp_parser.h 
#ifndef _FUNC_H_
#define _FUNC_H_

#include <Python.h>
#include <numpy/arrayobject.h>

void parse_ndarray(PyObject *);

#endif

我知道这可能是错误的,我也尝试了其他选项,但是它们都不起作用.

// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>

using namespace std;

void parse_ndarray(PyObject *obj) {
    if (PyArray_Check(obj)) { // this throws seg fault
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

PyArray_Check例程引发分段错误. PyArray_CheckExact不会抛出,但这不是我想要的.

# parser.pxd
cdef extern from "cpp_parser.h": 
    cdef void parse_ndarray(object)

,实现文件为:

# parser.pyx
import numpy as np
cimport numpy as np

def py_parse_array(object x):
    assert isinstance(x, np.ndarray)
    parse_ndarray(x)

setup.py脚本是

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

import numpy as np

ext = Extension(
    name='parser',
    sources=['parser.pyx', 'cpp_parser.cpp'],
    language='c++',
    include_dirs=[np.get_include()],
    extra_compile_args=['-fPIC'],
)

setup(
    name='parser',
    ext_modules=cythonize([ext])
    )

最后是测试脚本:

# run_test.py
import numpy as np
from parser import py_parse_array

x = np.arange(10)
py_parse_array(x)

我已经使用上述所有脚本创建了一个git repo: https://github.com/giantwhale/study_cython_numpy/

解决方案

快速修复(请继续阅读以获取更多详细信息和更复杂的方法):

您需要通过调用import_array()在使用numpy-stuff的每个cpp文件中初始化变量PyArray_API:

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
     import_array(); // PyError if not successful
     return 0;
}

const static int numpy_initialized =  init_numpy();

void parse_ndarraray(PyObject *obj) { // would be called every time
    if (PyArray_Check(obj)) {
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

还可以使用_import_array(如果未成功返回一个负数)来使用自定义错误处理. 请参见以获取import_array.

警告::正如@ isra60所指出的,_import_array()/import_array()仅在初始化Python之后(即在调用Py_Initialize()之后)才能调用.扩展总是如此,但如果嵌入了python解释器,情况并非总是如此,因为numpy_initialized是在main -starts之前初始化的.在这种情况下,不应使用初始化技巧",而应在Py_Initialize()之后调用init_numpy().


复杂的解决方案:

注意:有关信息,为什么需要设置PyArray_API,请参见此 SO-answer :能够将符号的解析推迟到运行时,因此在链接时不需要numpy的共享库,并且一定不能在dynamic-library-path上(这样,python的系统路径就足够了.)

提出的解决方案很快,但是如果使用numpy的cpp有多个,则其中一个会初始化很多PyArray_API实例.

如果未将PyArray_API定义为静态,而是在除一个翻译单元之外的所有翻译单元中将其定义为extern,则可以避免这种情况.对于那些翻译单元 NO_IMPORT_ARRAY 宏必须在包含numpy/arrayobject.h之前定义.

但是,我们需要一个在其中定义该符号的翻译单元.对于此翻译单元,不得定义宏NO_IMPORT_ARRAY.

但是,如果不定义宏PY_ARRAY_UNIQUE_SYMBOL,我们将仅获得一个静态符号,即对于其他翻译单元不可见,因此链接器将失败.这样做的原因是:如果有两个库,每个人都定义一个PyArray_API,那么我们将有一个符号的多重定义,并且链接器将失败,即,我们不能将这两个库一起使用.

因此,通过在每个numpy/arrayobject.h包含之前将PY_ARRAY_UNIQUE_SYMBOL定义为MY_FANCY_LIB_PyArray_API,我们将拥有自己的PyArray_API名称,不会与其他库冲突.

将它们放在一起:

A: use_numpy.h-用于包含numpy功能的标头,即numpy/arrayobject.h

//use_numpy.h

//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API 

//this macro must be defined for the translation unit              
#ifndef INIT_NUMPY_ARRAY_CPP 
    #define NO_IMPORT_ARRAY //for usual translation units
#endif

//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>

B: init_numpy_api.cpp-用于初始化全局MY_PyArray_API:

的翻译单元

//init_numpy_api.cpp

//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP

//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"

//now the old trick with initialization:
int init_numpy(){
     import_array();// PyError if not successful
     return 0;
}
const static int numpy_initialized =  init_numpy();

C:仅在需要numpy时包含use_numpy.h,它将定义extern void **MyPyArray_API:

//example
#include "use_numpy.h"

...
PyArray_Check(obj); // works, no segmentation error

警告:不应忘记,要使初始化技巧起作用,必须已经调用Py_Initialize().


您为什么需要它(出于历史原因而保留):

当我使用调试符号构建扩展程序时:

extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],

并使用gdb运行它:

 gdb --args python run_test.py
 (gdb) run
  --- Segmentation fault
 (gdb) disass

我可以看到以下内容:

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
       # 0x7ffff1f2d940 <_ZL11PyArray_API>
   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
   ...
   (gdb) print $rax
   $1 = 16

我们应该记住,PyArray_Check只是一个

The PyArray_Check routine throws Segmentation Fault. PyArray_CheckExact doesn't throw, but it is not what I wanted exactly.

# parser.pxd
cdef extern from "cpp_parser.h": 
    cdef void parse_ndarray(object)

and the implementation file is:

# parser.pyx
import numpy as np
cimport numpy as np

def py_parse_array(object x):
    assert isinstance(x, np.ndarray)
    parse_ndarray(x)

The setup.py script is

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize

import numpy as np

ext = Extension(
    name='parser',
    sources=['parser.pyx', 'cpp_parser.cpp'],
    language='c++',
    include_dirs=[np.get_include()],
    extra_compile_args=['-fPIC'],
)

setup(
    name='parser',
    ext_modules=cythonize([ext])
    )

And finally the test script:

# run_test.py
import numpy as np
from parser import py_parse_array

x = np.arange(10)
py_parse_array(x)

I have created a git repo with all the scripts above: https://github.com/giantwhale/study_cython_numpy/

解决方案

Quick Fix (read on for more details and a more sophisticated approach):

You need to initialize the variable PyArray_API in every cpp-file in which you are using numpy-stuff by calling import_array():

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
     import_array(); // PyError if not successful
     return 0;
}

const static int numpy_initialized =  init_numpy();

void parse_ndarraray(PyObject *obj) { // would be called every time
    if (PyArray_Check(obj)) {
        cout << "PyArray_Check Passed" << endl;
    } else {
        cout << "PyArray_Check Failed" << endl;
    }
}

One could also use _import_array, which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array.

Warning: As pointed out by @isra60, _import_array()/import_array() can only be called, once Python is initialized, i.e. after Py_Initialize() was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized is initialized before main-starts. In this case, "the initialization trick" should not be used but init_numpy() called after Py_Initialize().


Sophisticated solution:

NB: For information, why setting PyArray_API is needed, see this SO-answer: in order to be able to postpone resolution of symbols until running time, so numpy's shared object aren't needed at link time and must not be on dynamic-library-path (python's system path is enough then).

The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized.

This can be avoided if PyArray_API isn't defined as static but as extern in all but one translation unit. For those translation units NO_IMPORT_ARRAY macro must be defined before numpy/arrayobject.h is included.

We need however a translation unit in which this symbol is defined. For this translation unit the macro NO_IMPORT_ARRAY must not be defined.

However, without defining the macro PY_ARRAY_UNIQUE_SYMBOL we will get only a static symbol, i.e. not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API then we would have a multiple definition of a symbol and the linker will fail, i.e. we cannot use these both libraries together.

Thus, by defining PY_ARRAY_UNIQUE_SYMBOL as MY_FANCY_LIB_PyArray_API prior to every include of numpy/arrayobject.h we would have our own PyArray_API-name, which would not clash with other libraries.

Putting it all together:

A: use_numpy.h - your header for including numpy-functionality i.e. numpy/arrayobject.h

//use_numpy.h

//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API 

//this macro must be defined for the translation unit              
#ifndef INIT_NUMPY_ARRAY_CPP 
    #define NO_IMPORT_ARRAY //for usual translation units
#endif

//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>

B: init_numpy_api.cpp - a translation unit for initializing of the global MY_PyArray_API:

//init_numpy_api.cpp

//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP

//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"

//now the old trick with initialization:
int init_numpy(){
     import_array();// PyError if not successful
     return 0;
}
const static int numpy_initialized =  init_numpy();

C: just include use_numpy.h whenever you need numpy, it will define extern void **MyPyArray_API:

//example
#include "use_numpy.h"

...
PyArray_Check(obj); // works, no segmentation error

Warning: It should not be forgotten, that for initialization-trick to work, Py_Initialize() must be already called.


Why do you need it (kept for historical reasons):

When I build your extension with debug symbols:

extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],

and run it with gdb:

 gdb --args python run_test.py
 (gdb) run
  --- Segmentation fault
 (gdb) disass

I can see the following:

   0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
       # 0x7ffff1f2d940 <_ZL11PyArray_API>
   0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
   ...
   (gdb) print $rax
   $1 = 16

We should keep in mind, that PyArray_Check is only a define for:

#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)

That seems, that &PyArray_Type uses somehow a part of PyArray_API which is not initialized (has value 0).

Let's take a look at the cpp_parser.cpp after the preprocessor (compiled with flag -E:

 static void **PyArray_API= __null
 ...
 static int
_import_array(void)
{
  PyArray_API = (void **)PyCapsule_GetPointer(c_api,...

So PyArray_API is static and is initialized via _import_array(void), that actually would explain the warning I get during the build, that _import_array() was defined but not used - we didn't initialize PyArray_API.

Because PyArray_API is a static variable it must be initialized in every compilation unit i.e. cpp - file.

So we just need to do it - import_array() seems to be the official way.

这篇关于PyArray_Check使用Cython/C ++给出分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆