用现有的C对象初始化Cython对象 [英] Initializing Cython objects with existing C Objects

查看:69
本文介绍了用现有的C对象初始化Cython对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++模型

说我有以下希望暴露给Python的C ++数据结构.

#include <memory>
#include <vector>

struct mystruct
{
    int a, b, c, d, e, f, g, h, i, j, k, l, m;
};

typedef std::vector<std::shared_ptr<mystruct>> mystruct_list;

增强Python

我可以使用boost :: python和下面的代码相当有效地包装它们,轻松地使我可以使用现有的mystruct(复制shared_ptr)而不是重新创建现有的对象.

#include "mystruct.h"
#include <boost/python.hpp>

using namespace boost::python;


BOOST_PYTHON_MODULE(example)
{
    class_<mystruct, std::shared_ptr<mystruct>>("MyStruct", init<>())
        .def_readwrite("a", &mystruct::a);
        // add the rest of the member variables

    class_<mystruct_list>("MyStructList", init<>())
        .def("at", &mystruct_list::at, return_value_policy<copy_const_reference>());
        // add the rest of the member functions
}

Cython

在Cython中,我不知道如何在不复制基础数据的情况下从mystruct_list中提取项目.我不知道如何从现有的shared_ptr<mystruct>初始化MyStruct,而不用各种形式之一复制所有数据.

from libcpp.memory cimport shared_ptr
from cython.operator cimport dereference


cdef extern from "mystruct.h" nogil:
    cdef cppclass mystruct:
        int a, b, c, d, e, f, g, h, i, j, k, l, m

    ctypedef vector[v] mystruct_list


cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(MyStruct self):
        self.ptr.reset(new mystruct)

    property a:
        def __get__(MyStruct self):
            return dereference(self.ptr).a

        def __set__(MyStruct self, int value):
            dereference(self.ptr).a = value


cdef class MyStructList:
    cdef mystruct_list c
    cdef mystruct_list.iterator it

    def __cinit__(MyStructList self):
        pass

    def __getitem__(MyStructList self, int index):
        # How do return MyStruct without copying the underlying `mystruct` 
        pass

我看到了许多可能的解决方法,但都不是很令人满意的:

我可以初始化一个空的MyStruct,然后在Cython中通过shared_ptr进行分配.但是,这绝对会毫无理由地浪费初始化的结构.

MyStruct value
value.ptr = self.c.at(index)
return value

我还可以将数据从现有的mystruct复制到新的mystruct.但是,这也有类似的肿胀.

MyStruct value
dereference(value.ptr).a = dereference(self.c.at(index)).a
return value

我还可以为每个__cinit__方法公开一个init=True标志,如果C对象已经存在(当init为False时),这将防止在内部重建对象.但是,这可能会导致灾难性问题,因为它将暴露给Python API并且允许取消引用空指针或未初始化的指针.

def __cinit__(MyStruct self, bint init=True):
    if init:
        self.ptr.reset(new mystruct)

我还可以使用暴露于Python的构造函数(将重置self.ptr)重载__init__,但是如果在Python层中使用__new__,则这将具有危险的内存安全性.

底线

我想使用Cython来实现编译速度,语法糖和许多其他原因,而不是笨拙的boost :: python.我现在正在看pybind11,它可能会解决编译速度问题,但我还是更喜欢使用Cython.

有什么办法可以让我在Cython中惯用地完成如此简单的任务吗?谢谢.

解决方案

在Cython中,此方法的工作方式是通过使用工厂类从共享指针中创建Python对象.这使您无需复制即可访问基础C/C ++结构.

示例Cython代码:

 <..>

cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(self):
        # Do not create new ref here, we will
        # pass one in from Cython code
        self.ptr = NULL

    def __dealloc__(self):
        # Do de-allocation here, important!
        if self.ptr is not NULL:
            <de-alloc>

    <rest per MyStruct code above>

cdef object PyStruct(shared_ptr[mystruct] MyStruct_ptr):
    """Python object factory class taking Cpp mystruct pointer
    as argument
    """
    # Create new MyStruct object. This does not create
    # new structure but does allocate a null pointer
    cdef MyStruct _mystruct = MyStruct()
    # Set pointer of cdef class to existing struct ptr
    _mystruct.ptr = MyStruct_ptr
    # Return the wrapped MyStruct object with MyStruct_ptr
    return _mystruct

def make_structure():
    """Function to create new Cpp mystruct and return
    python object representation of it
    """
    cdef MyStruct mypystruct = PyStruct(new mystruct)
    return mypystruct
 

请注意,PyStruct参数的类型是Cpp结构的指针.

然后

mypystruct是工厂类返回的类MyStruct的python对象,该类指向 Cpp mystruct无需复制.根据make_structure代码,可以在def cython函数中安全返回mypystruct并在python空间中使用.

要返回现有Cpp mystruct指针的Python对象,只需将其像一样用PyStruct包装

return PyStruct(my_cpp_struct_ptr)

您的Cython代码中的任何地方.

很明显,只有def函数在此处可见,因此,如果要在Python空间中使用Cpp函数调用,则也需要将它们包装在MyStruct中,至少要使Cython类内部的Cpp函数调用放开GiL(可能出于明显的原因值得这么做).

有关真实示例,请参见以下 Cython扩展代码 Cython中的基础C代码绑定.另请参见此用于C函数调用的Python函数包装的代码放开了GIL .不是Cpp,但同样适用.

另请参见有关何时需要工厂类/功能的Cython官方文档(Note that all constructor arguments will be passed as Python objects).对于内置类型,Cython会为您执行此转换,但对于自定义结构或对象,则需要工厂类/函数.

如果您希望工厂类为您实际创建C ++结构(取决于实际情况),则可以根据上述建议在PyStruct__new__中处理Cpp结构初始化.

带有指针参数的工厂类的好处是它允许您使用C/C ++结构的现有指针并将其包装在Python扩展类中,而不必总是创建新的指针.例如,拥有多个引用同一基础C结构的Python对象将是绝对安全的. Python的引用计数可确保它们不会过早取消分配.尽管在分配时仍应检查null,因为共享指针可能已经被显式分配(例如,通过del分配).

请注意,即使创建新的python对象指向相同的C ++结构,也存在一些开销.不是很多,但还是.

IMO,这种C/C ++指针的自动重新分配和引用计数是Python C扩展API的最大功能之一.由于所有这些作用于Python对象(单独),因此C/C ++结构需要包装在兼容的Python object类定义中.

注意-我的经验主要是在C中,以上内容可能需要调整,因为我比C ++的共享指针更熟悉常规C指针.

C++ Model

Say I have the following C++ data structures I wish to expose to Python.

#include <memory>
#include <vector>

struct mystruct
{
    int a, b, c, d, e, f, g, h, i, j, k, l, m;
};

typedef std::vector<std::shared_ptr<mystruct>> mystruct_list;

Boost Python

I can wrap these fairly effectively using boost::python with the following code, easily allowing me to use the existing mystruct (copying the shared_ptr) rather than recreating an existing object.

#include "mystruct.h"
#include <boost/python.hpp>

using namespace boost::python;


BOOST_PYTHON_MODULE(example)
{
    class_<mystruct, std::shared_ptr<mystruct>>("MyStruct", init<>())
        .def_readwrite("a", &mystruct::a);
        // add the rest of the member variables

    class_<mystruct_list>("MyStructList", init<>())
        .def("at", &mystruct_list::at, return_value_policy<copy_const_reference>());
        // add the rest of the member functions
}

Cython

In Cython, I have no idea how to extract an item from mystruct_list, without copying the underlying data. I have no idea how I could initialize MyStruct from the existing shared_ptr<mystruct>, without copying all the data over in one of various forms.

from libcpp.memory cimport shared_ptr
from cython.operator cimport dereference


cdef extern from "mystruct.h" nogil:
    cdef cppclass mystruct:
        int a, b, c, d, e, f, g, h, i, j, k, l, m

    ctypedef vector[v] mystruct_list


cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(MyStruct self):
        self.ptr.reset(new mystruct)

    property a:
        def __get__(MyStruct self):
            return dereference(self.ptr).a

        def __set__(MyStruct self, int value):
            dereference(self.ptr).a = value


cdef class MyStructList:
    cdef mystruct_list c
    cdef mystruct_list.iterator it

    def __cinit__(MyStructList self):
        pass

    def __getitem__(MyStructList self, int index):
        # How do return MyStruct without copying the underlying `mystruct` 
        pass

I see many possible workarounds, and none of them are very satisfactory:

I could initialize an empty MyStruct, and in Cython assign over the shared_ptr. However, this would result in wasting an initalized struct for absolutely no reason.

MyStruct value
value.ptr = self.c.at(index)
return value

I also could copy the data from the existing mystruct to the new mystruct. However, this suffers from similar bloat.

MyStruct value
dereference(value.ptr).a = dereference(self.c.at(index)).a
return value

I could also expose a init=True flag for each __cinit__ method, which would prevent reconstructing the object internally if the C-object exists already (when init is False). However, this could cause catastrophic issues, since it would be exposed to the Python API and would allow dereferencing a null or uninitialized pointer.

def __cinit__(MyStruct self, bint init=True):
    if init:
        self.ptr.reset(new mystruct)

I could also overload __init__ with the Python-exposed constructor (which would reset self.ptr), but this would have risky memory safety if __new__ was used from the Python layer.

Bottom-Line

I would love to use Cython, for compilation speed, syntactical sugar, and numerous other reasons, as opposed to the fairly clunky boost::python. I'm looking at pybind11 right now, and it may solve the compilation speed issues, but I would still prefer to use Cython.

Is there any way I can do such a simple task idiomatically in Cython? Thanks.

解决方案

The way this works in Cython is by having a factory class to create Python objects out of the shared pointer. This gives you access to the underlying C/C++ structure without copying.

Example Cython code:

<..>

cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(self):
        # Do not create new ref here, we will
        # pass one in from Cython code
        self.ptr = NULL

    def __dealloc__(self):
        # Do de-allocation here, important!
        if self.ptr is not NULL:
            <de-alloc>

    <rest per MyStruct code above>

cdef object PyStruct(shared_ptr[mystruct] MyStruct_ptr):
    """Python object factory class taking Cpp mystruct pointer
    as argument
    """
    # Create new MyStruct object. This does not create
    # new structure but does allocate a null pointer
    cdef MyStruct _mystruct = MyStruct()
    # Set pointer of cdef class to existing struct ptr
    _mystruct.ptr = MyStruct_ptr
    # Return the wrapped MyStruct object with MyStruct_ptr
    return _mystruct

def make_structure():
    """Function to create new Cpp mystruct and return
    python object representation of it
    """
    cdef MyStruct mypystruct = PyStruct(new mystruct)
    return mypystruct

Note the type for the argument of PyStruct is a pointer to the Cpp struct.

mypystruct then is a python object of class MyStruct, as returned by the factory class, which refers to the Cpp mystruct without copying. mypystruct can be safely returned in def cython functions and used in python space, per make_structure code.

To return a Python object of an existing Cpp mystruct pointer just wrap it with PyStruct like

return PyStruct(my_cpp_struct_ptr)

anywhere in your Cython code.

Obviously only def functions are visible there so the Cpp function calls would need to be wrapped as well inside MyStruct if they are to be used in Python space, at least if you want the Cpp function calls inside the Cython class to let go of the GiL (probably worth doing for obvious reasons).

For a real-world example see this Cython extension code and the underlying C code bindings in Cython. Also see this code for Python function wrapping of C function calls that let go of GIL. Not Cpp but same applies.

See also official Cython documentation on when a factory class/function is needed (Note that all constructor arguments will be passed as Python objects). For built in types, Cython does this conversion for you but for custom structures or objects a factory class/function is needed.

The Cpp structure initialisation could be handled in __new__ of PyStruct if needed, per suggestion above, if you want the factory class to actually create the C++ structure for you (depends on the use case really).

The benefit of a factory class with pointer arguments is it allows you to use existing pointers of C/C++ structures and wrap them in a Python extension class, rather than always having to create new ones. It would be perfectly safe to, for example, have multiple Python objects referring to the same underlying C struct. Python's ref counting ensures they won't be de-allocated prematurely. You should still check for null when deallocating though as the shared pointer could already had been de-allocated explicitly (eg, by del).

Note that there is, however, some overhead in creating new python objects even if they do point to the same C++ structure. Not a lot, but still.

IMO this auto de-allocation and ref counting of C/C++ pointers is one of the greatest features of Python's C extension API. As all that acts on Python objects (alone), the C/C++ structures need to be wrapped in a compatible Python object class definition.

Note - My experience is mostly in C, the above may need adjusting as I'm more familiar with regular C pointers than C++'s shared pointers.

这篇关于用现有的C对象初始化Cython对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆