如何整理/修复PyCXX创建的新型Python扩展类? [英] How to tidy/fix PyCXX's creation of new-style Python extension-class?

查看:130
本文介绍了如何整理/修复PyCXX创建的新型Python扩展类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我几乎已经完成了对C ++ Python包装器(PyCXX)的重写.

I've nearly finished rewriting a C++ Python wrapper (PyCXX).

原始版本允许使用新旧样式扩展类,但也可以从新样式类派生一个扩展类:

The original allows old and new style extension classes, but also allows one to derive from the new-style classes:

import test

// ok
a = test.new_style_class();

// also ok
class Derived( test.new_style_class() ):
    def __init__( self ):
        test_funcmapper.new_style_class.__init__( self )

    def derived_func( self ):
        print( 'derived_func' )
        super().func_noargs()

    def func_noargs( self ):
        print( 'derived func_noargs' )

d = Derived()

代码复杂,并且似乎包含错误(

The code is convoluted, and appears to contain errors (Why does PyCXX handle new-style classes in the way it does?)

我的问题是: PyCXX复杂机制的原理/理由是什么?有更清洁的选择吗?

我将尝试在下面详细说明我所处的位置.首先,我将尝试描述PyCXX目前正在做什么,然后我将描述我认为可以改进的地方.

I will attempt to detail below where I am at with this enquiry. First I will try and describe what PyCXX is doing at the moment, then I will describe what I think could maybe be improved.

当Python运行时遇到d = Derived()时,它会执行PyObject_Call( ob ) where ob is the PyTypeObject for NewStyleClass . I will write ob as NewStyleClass_PyTypeObject`.

When the Python runtime encounters d = Derived(), it does PyObject_Call( ob ) where ob is thePyTypeObjectforNewStyleClass. I will writeobasNewStyleClass_PyTypeObject`.

该PyTypeObject已用C ++构造并使用PyType_Ready

That PyTypeObject has been constructed in C++ and registered using PyType_Ready

PyObject_Call将调用type_call(PyTypeObject *type, PyObject *args, PyObject *kwds),并返回初始化的Derived实例,即

PyObject_Call will invoke type_call(PyTypeObject *type, PyObject *args, PyObject *kwds), returning an initialised Derived instance i.e.

PyObject* derived_instance = type_call(NewStyleClass_PyTypeObject, NULL, NULL)

类似这样的东西.

(所有这些都来自( http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence ,谢谢Eli!)

(All of this coming from (http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence by the way, thanks Eli!)

type_call实际上是:

type_call does essentially:

type->tp_new(type, args, kwds);
type->tp_init(obj, args, kwds);

我们的C ++包装器已将函数插入到NewStyleClass_PyTypeObjecttp_newtp_init插槽中,如下所示:

And our C++ wrapper has inserted functions into the tp_new and tp_init slots of NewStyleClass_PyTypeObject something like this:

typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );

:
    static PyObject* extension_object_new( PyTypeObject* subtype, 
                                              PyObject* args, PyObject* kwds )
    {
        PyObject* pyob = subtype->tp_alloc(subtype,0);

        Bridge* o = reinterpret_cast<Bridge *>( pyob );

        o->m_pycxx_object = nullptr;

        return pyob;
    }

    static int extension_object_init( PyObject* _self, 
                                            PyObject* args, PyObject* kwds )
    {
        Bridge* self{ reinterpret_cast<Bridge*>(_self) };

        // NOTE: observe this is where we invoke the constructor, 
        //       but indirectly (i.e. through final)
        self->m_pycxx_object = new FinalClass{ self, args, kwds };

        return 0;
    }

请注意,我们需要将Python派生实例与相应的C ++类实例绑定在一起. (为什么?在下面解释,请参阅"X").为此,我们正在使用:

Note that we need to bind together the Python Derived instance, and it's corresponding C++ class instance. (Why? Explained below, see 'X'). To do that we are using:

struct Bridge
{
    PyObject_HEAD // <-- a PyObject
    ExtObjBase* m_pycxx_object;
}

现在,这座桥引发了一个问题.我对此设计非常怀疑.

Now this bridge raises a question. I'm very suspicious of this design.

请注意如何为此新的PyObject分配内存:

Note how memory was allocated for this new PyObject:

        PyObject* pyob = subtype->tp_alloc(subtype,0);

然后我们将该指针转换为Bridge的类型,并在PyObject之后立即使用4或8(sizeof(void*))个字节来指向相应的C ++类实例(这在extension_object_init中被钩住为可以在上方看到.)

And then we typecast this pointer to Bridge, and use the 4 or 8 (sizeof(void*)) bytes immediately following the PyObject to point to the corresponding C++ class instance (this gets hooked up in extension_object_init as can be seen above).

现在要执行此操作,我们需要:

Now for this to work we require:

a)subtype->tp_alloc(subtype,0)必须分配额外的sizeof(void*)字节 b)PyObject不需要sizeof(PyObject_HEAD)以外的任何内存,因为如果这样做,则将与上面的指针冲突

a) subtype->tp_alloc(subtype,0) must be allocating an extra sizeof(void*) bytes b) The PyObject doesn't require any memory beyond sizeof(PyObject_HEAD), because if it did then this would be conflicting with the above pointer

我现在要提出的一个主要问题是: 我们能否保证Python运行时为我们的derived_instance创建的PyObject不会与Bridge的ExtObjBase* m_pycxx_object字段重叠?

One major question I have at this point is: Can we guarantee that the PyObject that the Python runtime has created for our derived_instance does not overlap into Bridge's ExtObjBase* m_pycxx_object field?

我将尝试回答这个问题:是美国确定要分配多少内存.当我们创建NewStyleClass_PyTypeObject时,我们需要为PyTypeObject分配多少内存以供这种类型的新实例使用

I will attempt to answer it: it is US determining how much memory gets allocated. When we create NewStyleClass_PyTypeObject we feed in how much memory we want this PyTypeObject to allocate for a new instance of this type:

template< TEMPLATE_TYPENAME FinalClass >
class ExtObjBase : public FuncMapper<FinalClass> , public ExtObjBase_noTemplate
{
protected:
    static TypeObject& typeobject()
    {
        static TypeObject* t{ nullptr };
        if( ! t )
            t = new TypeObject{ sizeof(FinalClass), typeid(FinalClass).name() };
                   /*           ^^^^^^^^^^^^^^^^^ this is the bug BTW!
                        The C++ Derived class instance never gets deposited
                        In the memory allocated by the Python runtime
                        (controlled by this parameter)

                        This value should be sizeof(Bridge) -- as pointed out
                        in the answer to the question linked above

        return *t;
    }
:
}

class TypeObject
{
private:
    PyTypeObject* table;

    // these tables fit into the main table via pointers
    PySequenceMethods*       sequence_table;
    PyMappingMethods*        mapping_table;
    PyNumberMethods*         number_table;
    PyBufferProcs*           buffer_table;

public:
    PyTypeObject* type_object() const
    {
        return table;
    }

    // NOTE: if you define one sequence method you must define all of them except the assigns

    TypeObject( size_t size_bytes, const char* default_name )
        : table{ new PyTypeObject{} }  // {} sets to 0
        , sequence_table{}
        , mapping_table{}
        , number_table{}
        , buffer_table{}
    {
        PyObject* table_as_object = reinterpret_cast<PyObject* >( table );

        *table_as_object = PyObject{ _PyObject_EXTRA_INIT  1, NULL }; 
        // ^ py_object_initializer -- NULL because type must be init'd by user

        table_as_object->ob_type = _Type_Type();

        // QQQ table->ob_size = 0;
        table->tp_name              = const_cast<char *>( default_name );
        table->tp_basicsize         = size_bytes;
        table->tp_itemsize          = 0; // sizeof(void*); // so as to store extra pointer

        table->tp_dealloc           = ...

您可以看到它以table->tp_basicsize

但是现在我看来,从NewStyleClass_PyTypeObject生成的PyObject-s永远不需要额外分配的内存.

But now it seems clear to me that PyObject-s generated from NewStyleClass_PyTypeObject will never require additional allocated memory.

这意味着不需要整个Bridge机制.

Which means that this whole Bridge mechanism is unnecessary.

这是PyCXX最初使用PyObject作为NewStyleClassCXXClass的基类的原始技术,并初始化该基数,以便Python运行时的d = Derived()的PyObject实际上是此基数,这种技术看起来不错.因为它允许无缝的类型转换.

And PyCXX's original technique for using PyObject as a base class of NewStyleClassCXXClass, and initialising this base so that the Python runtime's PyObject for d = Derived() is in fact this base, this technique is looking good. Because it allows seamless typecasting.

每当Python运行时从NewStyleClass_PyTypeObject调用插槽时,它将传递指向d的PyObject的指针作为第一个参数,我们可以将其类型转换回NewStyleClassCXXClass. <-'X'(上面已引用)

Whenever Python runtime calls a slot from NewStyleClass_PyTypeObject, it will be passing a pointer to d's PyObject as the first parameter, and we can just typecast back to NewStyleClassCXXClass. <-- 'X' (referenced above)

所以我真正的问题是:我们为什么不这样做呢? NewStyleClass派生有什么特别之处,它会强制为PyObject分配额外的 ?

So really my question is: why don't we just do this? Is there something special about deriving from NewStyleClass that forces extra allocation for the PyObject?

我知道在派生类的情况下我不理解创建顺序. Eli的帖子没有涵盖这一点.

I realise I don't understand the creation sequence in the case of a derived class. Eli's post didn't cover that.

我怀疑这可能与以下事实有关

I suspect this may be connected with the fact that

    static PyObject* extension_object_new( PyTypeObject* subtype, ...

^此变量名是子类型" 我不明白这一点,我不知道这是否可以抓住钥匙.

^ this variable name is 'subtype' I don't understand this, and I wonder if this may hold the key.

我想到了一个可能的解释,为什么PyCXX使用sizeof(FinalClass)进行初始化.它可能是经过尝试和抛弃的想法的遗物.即,如果Python的tp_new调用为FinalClass(以PyObject为基础)分配了足够的空间,则可以使用``placement new''或一些狡猾的reinterpret_cast业务在该确切位置上生成一个新的FinalClass.我的猜测是,这可能已经尝试过了,发现存在一些问题,可以解决,而遗物被遗忘了.

I thought of one possible explanation for why PyCXX is using sizeof(FinalClass) for initialisation. It might be a relic from an idea that got tried and discarded. i.e. If Python's tp_new call allocates enough space for the FinalClass (which has the PyObject as base), maybe a new FinalClass can be generated on that exact location using 'placement new', or some cunning reinterpret_cast business. My guess is this might have been tried, found to pose some problem, worked around, and the relic got left behind.

推荐答案

PyCXX不复杂.它确实有两个错误,但是可以轻松地修复它们,而无需对代码进行重大更改.

PyCXX is not convoluted. It does have two bugs, but they can be easily fixed without requiring significant changes to the code.

为Python API创建C ++包装器时,会遇到问题. C ++对象模型和Python新型对象模型有很大的不同.一个基本的区别是C ++具有创建和初始化对象的单个构造函数.虽然Python有两个阶段; tp_new创建对象并执行最小的初始化(或仅返回现有对象),而tp_init执行其余的初始化.

When creating a C++ wrapper for the Python API, one encounters a problem. The C++ object model and the Python new-style object model are very different. One fundamental difference is that C++ has a single constructor that both creates and initializes the object. While Python has two stages; tp_new creates the object and performs minimal intialization (or just returns an existing object) and tp_init performs the rest of the initialization.

PEP 253 ,您应该全文阅读:

tp_new()插槽和tp_init()插槽之间职责的区别在于它们确保的不变性. tp_new()插槽应仅确保最基本的不变式,否则将无法实现实现对象的C代码. tp_init()插槽应用于可覆盖的用户特定的初始化.以字典类型为例.该实现具有指向散列表的内部指针,该散列表不应为NULL.字典的tp_new()槽可处理此不变式.另一方面,字典的tp_init()插槽可用于根据传入的参数为字典提供一组初始的键和值.

The difference in responsibilities between the tp_new() slot and the tp_init() slot lies in the invariants they ensure. The tp_new() slot should ensure only the most essential invariants, without which the C code that implements the objects would break. The tp_init() slot should be used for overridable user-specific initializations. Take for example the dictionary type. The implementation has an internal pointer to a hash table which should never be NULL. This invariant is taken care of by the tp_new() slot for dictionaries. The dictionary tp_init() slot, on the other hand, could be used to give the dictionary an initial set of keys and values based on the arguments passed in.

...

您可能想知道为什么tp_new()插槽不应该调用tp_init()插槽本身.原因是在某些情况下(例如对持久对象的支持),重要的是能够创建特定类型的对象,而不必对它进行任何不必要的初始化.这可以通过调用tp_new()插槽而不调用tp_init()来方便地完成.还可能没有调用tp_init()或多次调用hat -即使在这些异常情况下,其操作也应可靠.

You may wonder why the tp_new() slot shouldn't call the tp_init() slot itself. The reason is that in certain circumstances (like support for persistent objects), it is important to be able to create an object of a particular type without initializing it any further than necessary. This may conveniently be done by calling the tp_new() slot without calling tp_init(). It is also possible hat tp_init() is not called, or called more than once -- its operation should be robust even in these anomalous cases.

C ++包装器的全部要点是使您能够编写出色的C ++代码.例如,假设您希望对象具有只能在构造期间初始化的数据成员.如果在tp_new期间创建对象,则无法在tp_init期间重新初始化该数据成员.这可能会迫使您通过某种智能指针来持有该数据成员,并在tp_new期间创建它.这使代码很难看.

The entire point of a C++ wrapper is to enable you to write nice C++ code. Say for example that you want your object to have a data member that can only be initialized during its construction. If you create the object during tp_new, then you cannot reinitialize that data member during tp_init. This will probably force you to hold that data member via some kind of a smart pointer and create it during tp_new. This makes the code ugly.

PyCXX采取的方法是将对象构造分为两部分:

The approach PyCXX takes is to separate object construction into two:

  • tp_new创建一个虚拟对象,只带有一个指向创建为tp_init的C ++对象的指针.该指针最初为空.

  • tp_new creates a dummy object with just a pointer to the C++ object which is created tp_init. This pointer is initially null.

tp_init分配并构造实际的C ++对象,然后更新在tp_new中创建的虚拟对象中的指针以指向它.如果tp_init被多次调用,则会引发Python异常.

tp_init allocates and constructs the actual C++ object, then updates the pointer in the dummy object created in tp_new to point to it. If tp_init is called more than once it raises a Python exception.

我个人认为,这种方法对我自己的应用程序的开销过高,但这是合法的方法.我在Python C/API周围有自己的C ++包装器,该包装器在tp_new中进行了所有初始化,这也是有缺陷的.似乎没有一个好的解决方案.

I personally think that the overhead of this approach for my own applications is too high, but it's a legitimate approach. I have my own C++ wrapper around the Python C/API that does all the initialization in tp_new, which is also flawed. There doesn't appear to be a good solution for that.

这篇关于如何整理/修复PyCXX创建的新型Python扩展类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆