PyArray_Check使用Cython/C ++给出分段错误 [英] PyArray_Check gives Segmentation Fault with Cython/C++
问题描述
提前谢谢大家.
我想知道使用#include
所有numpy标头的正确方法是什么,以及使用Cython和C ++解析numpy数组的正确方法是什么.下面是尝试:
// cpp_parser.h
#ifndef _FUNC_H_
#define _FUNC_H_
#include <Python.h>
#include <numpy/arrayobject.h>
void parse_ndarray(PyObject *);
#endif
我知道这可能是错误的,我也尝试了其他选项,但是它们都不起作用.
// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>
using namespace std;
void parse_ndarray(PyObject *obj) {
if (PyArray_Check(obj)) { // this throws seg fault
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}
PyArray_Check
例程引发分段错误. PyArray_CheckExact
不会抛出,但这不是我想要的.
# parser.pxd
cdef extern from "cpp_parser.h":
cdef void parse_ndarray(object)
,实现文件为:
# parser.pyx
import numpy as np
cimport numpy as np
def py_parse_array(object x):
assert isinstance(x, np.ndarray)
parse_ndarray(x)
setup.py
脚本是
# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np
ext = Extension(
name='parser',
sources=['parser.pyx', 'cpp_parser.cpp'],
language='c++',
include_dirs=[np.get_include()],
extra_compile_args=['-fPIC'],
)
setup(
name='parser',
ext_modules=cythonize([ext])
)
最后是测试脚本:
# run_test.py
import numpy as np
from parser import py_parse_array
x = np.arange(10)
py_parse_array(x)
我已经使用上述所有脚本创建了一个git repo: https://github.com/giantwhale/study_cython_numpy/
快速修复(请继续阅读以获取更多详细信息和更复杂的方法):
您需要通过调用import_array()
在使用numpy-stuff的每个cpp文件中初始化变量PyArray_API
:
//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
import_array(); // PyError if not successful
return 0;
}
const static int numpy_initialized = init_numpy();
void parse_ndarraray(PyObject *obj) { // would be called every time
if (PyArray_Check(obj)) {
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}
还可以使用_import_array
(如果未成功返回一个负数)来使用自定义错误处理. 请参见以获取import_array
.
警告::正如@ isra60所指出的,_import_array()/import_array()
仅在初始化Python之后(即在调用Py_Initialize()
之后)才能调用.扩展总是如此,但如果嵌入了python解释器,情况并非总是如此,因为numpy_initialized
是在main
-starts之前初始化的.在这种情况下,不应使用初始化技巧",而应在Py_Initialize()
之后调用init_numpy()
.
复杂的解决方案:
注意:有关信息,为什么需要设置PyArray_API
,请参见此 SO-answer :能够将符号的解析推迟到运行时,因此在链接时不需要numpy的共享库,并且一定不能在dynamic-library-path上(这样,python的系统路径就足够了.)
提出的解决方案很快,但是如果使用numpy的cpp有多个,则其中一个会初始化很多PyArray_API实例.
如果未将PyArray_API
定义为静态,而是在除一个翻译单元之外的所有翻译单元中将其定义为extern
,则可以避免这种情况.对于那些翻译单元 NO_IMPORT_ARRAY
宏必须在包含numpy/arrayobject.h
之前定义.
但是,我们需要一个在其中定义该符号的翻译单元.对于此翻译单元,不得定义宏NO_IMPORT_ARRAY
.
但是,如果不定义宏PY_ARRAY_UNIQUE_SYMBOL
,我们将仅获得一个静态符号,即对于其他翻译单元不可见,因此链接器将失败.这样做的原因是:如果有两个库,每个人都定义一个PyArray_API
,那么我们将有一个符号的多重定义,并且链接器将失败,即,我们不能将这两个库一起使用.
因此,通过在每个numpy/arrayobject.h
包含之前将PY_ARRAY_UNIQUE_SYMBOL
定义为MY_FANCY_LIB_PyArray_API
,我们将拥有自己的PyArray_API
名称,不会与其他库冲突.
将它们放在一起:
A: use_numpy.h-用于包含numpy功能的标头,即numpy/arrayobject.h
//use_numpy.h
//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API
//this macro must be defined for the translation unit
#ifndef INIT_NUMPY_ARRAY_CPP
#define NO_IMPORT_ARRAY //for usual translation units
#endif
//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>
B: init_numpy_api.cpp
-用于初始化全局MY_PyArray_API
:
//init_numpy_api.cpp
//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP
//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"
//now the old trick with initialization:
int init_numpy(){
import_array();// PyError if not successful
return 0;
}
const static int numpy_initialized = init_numpy();
C:仅在需要numpy时包含use_numpy.h
,它将定义extern void **MyPyArray_API
:
//example
#include "use_numpy.h"
...
PyArray_Check(obj); // works, no segmentation error
警告:不应忘记,要使初始化技巧起作用,必须已经调用Py_Initialize()
.
您为什么需要它(出于历史原因而保留):
当我使用调试符号构建扩展程序时:
extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],
并使用gdb运行它:
gdb --args python run_test.py
(gdb) run
--- Segmentation fault
(gdb) disass
我可以看到以下内容:
0x00007ffff1d2a6d9 <+20>: mov 0x203260(%rip),%rax
# 0x7ffff1f2d940 <_ZL11PyArray_API>
0x00007ffff1d2a6e0 <+27>: add $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>: mov (%rax),%rax
...
(gdb) print $rax
$1 = 16
我们应该记住, The and the implementation file is: The And finally the test script: I have created a git repo with all the scripts above: https://github.com/giantwhale/study_cython_numpy/ Quick Fix (read on for more details and a more sophisticated approach): You need to initialize the variable One could also use Warning: As pointed out by @isra60, Sophisticated solution: NB: For information, why setting The proposed solution is quick, but if there are more than one cpp using numpy, one have a lot of instances of PyArray_API initialized. This can be avoided if We need however a translation unit in which this symbol is defined. For this translation unit the macro However, without defining the macro Thus, by defining Putting it all together: A: use_numpy.h - your header for including numpy-functionality i.e. B: C: just include Warning: It should not be forgotten, that for initialization-trick to work, Why do you need it (kept for historical reasons): When I build your extension with debug symbols: and run it with gdb: I can see the following: We should keep in mind, that That seems, that Let's take a look at the So Because So we just need to do it - 这篇关于PyArray_Check使用Cython/C ++给出分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!PyArray_Check
只是一个PyArray_Check
routine throws Segmentation Fault. PyArray_CheckExact
doesn't throw, but it is not what I wanted exactly. # parser.pxd
cdef extern from "cpp_parser.h":
cdef void parse_ndarray(object)
# parser.pyx
import numpy as np
cimport numpy as np
def py_parse_array(object x):
assert isinstance(x, np.ndarray)
parse_ndarray(x)
setup.py
script is# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np
ext = Extension(
name='parser',
sources=['parser.pyx', 'cpp_parser.cpp'],
language='c++',
include_dirs=[np.get_include()],
extra_compile_args=['-fPIC'],
)
setup(
name='parser',
ext_modules=cythonize([ext])
)
# run_test.py
import numpy as np
from parser import py_parse_array
x = np.arange(10)
py_parse_array(x)
PyArray_API
in every cpp-file in which you are using numpy-stuff by calling import_array()
://it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
import_array(); // PyError if not successful
return 0;
}
const static int numpy_initialized = init_numpy();
void parse_ndarraray(PyObject *obj) { // would be called every time
if (PyArray_Check(obj)) {
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}
_import_array
, which returns a negative number if not successful, to use a custom error handling. See here for definition of import_array
._import_array()/import_array()
can only be called, once Python is initialized, i.e. after Py_Initialize()
was called. This is always the case for an extension, but not always the case if the python interpreter is embedded, because numpy_initialized
is initialized before main
-starts. In this case, "the initialization trick" should not be used but init_numpy()
called after Py_Initialize()
.
PyArray_API
is needed, see this SO-answer: in order to be able to postpone resolution of symbols until running time, so numpy's shared object aren't needed at link time and must not be on dynamic-library-path (python's system path is enough then).PyArray_API
isn't defined as static but as extern
in all but one translation unit. For those translation units NO_IMPORT_ARRAY
macro must be defined before numpy/arrayobject.h
is included.NO_IMPORT_ARRAY
must not be defined. PY_ARRAY_UNIQUE_SYMBOL
we will get only a static symbol, i.e. not visible for other translations unit, thus the linker will fail. The reason for that: if there are two libraries and everyone defines a PyArray_API
then we would have a multiple definition of a symbol and the linker will fail, i.e. we cannot use these both libraries together.PY_ARRAY_UNIQUE_SYMBOL
as MY_FANCY_LIB_PyArray_API
prior to every include of numpy/arrayobject.h
we would have our own PyArray_API
-name, which would not clash with other libraries.numpy/arrayobject.h
//use_numpy.h
//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API
//this macro must be defined for the translation unit
#ifndef INIT_NUMPY_ARRAY_CPP
#define NO_IMPORT_ARRAY //for usual translation units
#endif
//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>
init_numpy_api.cpp
- a translation unit for initializing of the global MY_PyArray_API
://init_numpy_api.cpp
//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP
//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"
//now the old trick with initialization:
int init_numpy(){
import_array();// PyError if not successful
return 0;
}
const static int numpy_initialized = init_numpy();
use_numpy.h
whenever you need numpy, it will define extern void **MyPyArray_API
://example
#include "use_numpy.h"
...
PyArray_Check(obj); // works, no segmentation error
Py_Initialize()
must be already called.
extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],
gdb --args python run_test.py
(gdb) run
--- Segmentation fault
(gdb) disass
0x00007ffff1d2a6d9 <+20>: mov 0x203260(%rip),%rax
# 0x7ffff1f2d940 <_ZL11PyArray_API>
0x00007ffff1d2a6e0 <+27>: add $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>: mov (%rax),%rax
...
(gdb) print $rax
$1 = 16
PyArray_Check
is only a define for:#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)
&PyArray_Type
uses somehow a part of PyArray_API
which is not initialized (has value 0
).cpp_parser.cpp
after the preprocessor (compiled with flag -E
: static void **PyArray_API= __null
...
static int
_import_array(void)
{
PyArray_API = (void **)PyCapsule_GetPointer(c_api,...
PyArray_AP
I is static and is initialized via _import_array(void)
, that actually would explain the warning I get during the build, that _import_array()
was defined but not used - we didn't initialize PyArray_API
.PyArray_API
is a static variable it must be initialized in every compilation unit i.e. cpp - file.import_array()
seems to be the official way.