如何整理/修复PyCXX创建的新型Python扩展类? [英] How to tidy/fix PyCXX's creation of new-style Python extension-class?
问题描述
我几乎已经完成了对C ++ Python包装器(PyCXX)的重写.
I've nearly finished rewriting a C++ Python wrapper (PyCXX).
原始版本允许使用新旧样式扩展类,但也可以从新样式类派生一个扩展类:
The original allows old and new style extension classes, but also allows one to derive from the new-style classes:
import test
// ok
a = test.new_style_class();
// also ok
class Derived( test.new_style_class() ):
def __init__( self ):
test_funcmapper.new_style_class.__init__( self )
def derived_func( self ):
print( 'derived_func' )
super().func_noargs()
def func_noargs( self ):
print( 'derived func_noargs' )
d = Derived()
The code is convoluted, and appears to contain errors (Why does PyCXX handle new-style classes in the way it does?)
我的问题是: PyCXX复杂机制的原理/理由是什么?有更清洁的选择吗?
我将尝试在下面详细说明我所处的位置.首先,我将尝试描述PyCXX目前正在做什么,然后我将描述我认为可以改进的地方.
I will attempt to detail below where I am at with this enquiry. First I will try and describe what PyCXX is doing at the moment, then I will describe what I think could maybe be improved.
当Python运行时遇到d = Derived()
时,它会执行PyObject_Call( ob ) where ob is the
PyTypeObject for
NewStyleClass . I will write
ob as
NewStyleClass_PyTypeObject`.
When the Python runtime encounters d = Derived()
, it does PyObject_Call( ob ) where ob is the
PyTypeObjectfor
NewStyleClass. I will write
obas
NewStyleClass_PyTypeObject`.
该PyTypeObject已用C ++构造并使用PyType_Ready
That PyTypeObject has been constructed in C++ and registered using PyType_Ready
PyObject_Call
将调用type_call(PyTypeObject *type, PyObject *args, PyObject *kwds)
,并返回初始化的Derived实例,即
PyObject_Call
will invoke type_call(PyTypeObject *type, PyObject *args, PyObject *kwds)
, returning an initialised Derived instance i.e.
PyObject* derived_instance = type_call(NewStyleClass_PyTypeObject, NULL, NULL)
类似这样的东西.
(所有这些都来自( http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence ,谢谢Eli!)
(All of this coming from (http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence by the way, thanks Eli!)
type_call实际上是:
type_call does essentially:
type->tp_new(type, args, kwds);
type->tp_init(obj, args, kwds);
我们的C ++包装器已将函数插入到NewStyleClass_PyTypeObject
的tp_new
和tp_init
插槽中,如下所示:
And our C++ wrapper has inserted functions into the tp_new
and tp_init
slots of NewStyleClass_PyTypeObject
something like this:
typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );
:
static PyObject* extension_object_new( PyTypeObject* subtype,
PyObject* args, PyObject* kwds )
{
PyObject* pyob = subtype->tp_alloc(subtype,0);
Bridge* o = reinterpret_cast<Bridge *>( pyob );
o->m_pycxx_object = nullptr;
return pyob;
}
static int extension_object_init( PyObject* _self,
PyObject* args, PyObject* kwds )
{
Bridge* self{ reinterpret_cast<Bridge*>(_self) };
// NOTE: observe this is where we invoke the constructor,
// but indirectly (i.e. through final)
self->m_pycxx_object = new FinalClass{ self, args, kwds };
return 0;
}
请注意,我们需要将Python派生实例与相应的C ++类实例绑定在一起. (为什么?在下面解释,请参阅"X").为此,我们正在使用:
Note that we need to bind together the Python Derived instance, and it's corresponding C++ class instance. (Why? Explained below, see 'X'). To do that we are using:
struct Bridge
{
PyObject_HEAD // <-- a PyObject
ExtObjBase* m_pycxx_object;
}
现在,这座桥引发了一个问题.我对此设计非常怀疑.
Now this bridge raises a question. I'm very suspicious of this design.
请注意如何为此新的PyObject分配内存:
Note how memory was allocated for this new PyObject:
PyObject* pyob = subtype->tp_alloc(subtype,0);
然后我们将该指针转换为Bridge
的类型,并在PyObject
之后立即使用4或8(sizeof(void*)
)个字节来指向相应的C ++类实例(这在extension_object_init
中被钩住为可以在上方看到.)
And then we typecast this pointer to Bridge
, and use the 4 or 8 (sizeof(void*)
) bytes immediately following the PyObject
to point to the corresponding C++ class instance (this gets hooked up in extension_object_init
as can be seen above).
现在要执行此操作,我们需要:
Now for this to work we require:
a)subtype->tp_alloc(subtype,0)
必须分配额外的sizeof(void*)
字节
b)PyObject
不需要sizeof(PyObject_HEAD)
以外的任何内存,因为如果这样做,则将与上面的指针冲突
a) subtype->tp_alloc(subtype,0)
must be allocating an extra sizeof(void*)
bytes
b) The PyObject
doesn't require any memory beyond sizeof(PyObject_HEAD)
, because if it did then this would be conflicting with the above pointer
我现在要提出的一个主要问题是:
我们能否保证Python运行时为我们的derived_instance
创建的PyObject
不会与Bridge的ExtObjBase* m_pycxx_object
字段重叠?
One major question I have at this point is:
Can we guarantee that the PyObject
that the Python runtime has created for our derived_instance
does not overlap into Bridge's ExtObjBase* m_pycxx_object
field?
我将尝试回答这个问题:是美国确定要分配多少内存.当我们创建NewStyleClass_PyTypeObject
时,我们需要为PyTypeObject
分配多少内存以供这种类型的新实例使用
I will attempt to answer it: it is US determining how much memory gets allocated. When we create NewStyleClass_PyTypeObject
we feed in how much memory we want this PyTypeObject
to allocate for a new instance of this type:
template< TEMPLATE_TYPENAME FinalClass >
class ExtObjBase : public FuncMapper<FinalClass> , public ExtObjBase_noTemplate
{
protected:
static TypeObject& typeobject()
{
static TypeObject* t{ nullptr };
if( ! t )
t = new TypeObject{ sizeof(FinalClass), typeid(FinalClass).name() };
/* ^^^^^^^^^^^^^^^^^ this is the bug BTW!
The C++ Derived class instance never gets deposited
In the memory allocated by the Python runtime
(controlled by this parameter)
This value should be sizeof(Bridge) -- as pointed out
in the answer to the question linked above
return *t;
}
:
}
class TypeObject
{
private:
PyTypeObject* table;
// these tables fit into the main table via pointers
PySequenceMethods* sequence_table;
PyMappingMethods* mapping_table;
PyNumberMethods* number_table;
PyBufferProcs* buffer_table;
public:
PyTypeObject* type_object() const
{
return table;
}
// NOTE: if you define one sequence method you must define all of them except the assigns
TypeObject( size_t size_bytes, const char* default_name )
: table{ new PyTypeObject{} } // {} sets to 0
, sequence_table{}
, mapping_table{}
, number_table{}
, buffer_table{}
{
PyObject* table_as_object = reinterpret_cast<PyObject* >( table );
*table_as_object = PyObject{ _PyObject_EXTRA_INIT 1, NULL };
// ^ py_object_initializer -- NULL because type must be init'd by user
table_as_object->ob_type = _Type_Type();
// QQQ table->ob_size = 0;
table->tp_name = const_cast<char *>( default_name );
table->tp_basicsize = size_bytes;
table->tp_itemsize = 0; // sizeof(void*); // so as to store extra pointer
table->tp_dealloc = ...
您可以看到它以table->tp_basicsize
但是现在我看来,从NewStyleClass_PyTypeObject
生成的PyObject-s永远不需要额外分配的内存.
But now it seems clear to me that PyObject-s generated from NewStyleClass_PyTypeObject
will never require additional allocated memory.
这意味着不需要整个Bridge
机制.
Which means that this whole Bridge
mechanism is unnecessary.
这是PyCXX最初使用PyObject作为NewStyleClassCXXClass
的基类的原始技术,并初始化该基数,以便Python运行时的d = Derived()
的PyObject实际上是此基数,这种技术看起来不错.因为它允许无缝的类型转换.
And PyCXX's original technique for using PyObject as a base class of NewStyleClassCXXClass
, and initialising this base so that the Python runtime's PyObject for d = Derived()
is in fact this base, this technique is looking good. Because it allows seamless typecasting.
每当Python运行时从NewStyleClass_PyTypeObject
调用插槽时,它将传递指向d的PyObject的指针作为第一个参数,我们可以将其类型转换回NewStyleClassCXXClass
. <-'X'(上面已引用)
Whenever Python runtime calls a slot from NewStyleClass_PyTypeObject
, it will be passing a pointer to d's PyObject as the first parameter, and we can just typecast back to NewStyleClassCXXClass
. <-- 'X' (referenced above)
所以我真正的问题是:我们为什么不这样做呢? 从NewStyleClass
派生有什么特别之处,它会强制为PyObject分配额外的 ?
So really my question is: why don't we just do this? Is there something special about deriving from NewStyleClass
that forces extra allocation for the PyObject?
我知道在派生类的情况下我不理解创建顺序. Eli的帖子没有涵盖这一点.
I realise I don't understand the creation sequence in the case of a derived class. Eli's post didn't cover that.
我怀疑这可能与以下事实有关
I suspect this may be connected with the fact that
static PyObject* extension_object_new( PyTypeObject* subtype, ...
^此变量名是子类型" 我不明白这一点,我不知道这是否可以抓住钥匙.
^ this variable name is 'subtype' I don't understand this, and I wonder if this may hold the key.
我想到了一个可能的解释,为什么PyCXX使用sizeof(FinalClass)进行初始化.它可能是经过尝试和抛弃的想法的遗物.即,如果Python的tp_new调用为FinalClass(以PyObject为基础)分配了足够的空间,则可以使用``placement new''或一些狡猾的reinterpret_cast业务在该确切位置上生成一个新的FinalClass.我的猜测是,这可能已经尝试过了,发现存在一些问题,可以解决,而遗物被遗忘了.
I thought of one possible explanation for why PyCXX is using sizeof(FinalClass) for initialisation. It might be a relic from an idea that got tried and discarded. i.e. If Python's tp_new call allocates enough space for the FinalClass (which has the PyObject as base), maybe a new FinalClass can be generated on that exact location using 'placement new', or some cunning reinterpret_cast business. My guess is this might have been tried, found to pose some problem, worked around, and the relic got left behind.
推荐答案
PyCXX不复杂.它确实有两个错误,但是可以轻松地修复它们,而无需对代码进行重大更改.
PyCXX is not convoluted. It does have two bugs, but they can be easily fixed without requiring significant changes to the code.
为Python API创建C ++包装器时,会遇到问题. C ++对象模型和Python新型对象模型有很大的不同.一个基本的区别是C ++具有创建和初始化对象的单个构造函数.虽然Python有两个阶段; tp_new
创建对象并执行最小的初始化(或仅返回现有对象),而tp_init
执行其余的初始化.
When creating a C++ wrapper for the Python API, one encounters a problem. The C++ object model and the Python new-style object model are very different. One fundamental difference is that C++ has a single constructor that both creates and initializes the object. While Python has two stages; tp_new
creates the object and performs minimal intialization (or just returns an existing object) and tp_init
performs the rest of the initialization.
PEP 253 ,您应该全文阅读:
tp_new()插槽和tp_init()插槽之间职责的区别在于它们确保的不变性. tp_new()插槽应仅确保最基本的不变式,否则将无法实现实现对象的C代码. tp_init()插槽应用于可覆盖的用户特定的初始化.以字典类型为例.该实现具有指向散列表的内部指针,该散列表不应为NULL.字典的tp_new()槽可处理此不变式.另一方面,字典的tp_init()插槽可用于根据传入的参数为字典提供一组初始的键和值.
The difference in responsibilities between the tp_new() slot and the tp_init() slot lies in the invariants they ensure. The tp_new() slot should ensure only the most essential invariants, without which the C code that implements the objects would break. The tp_init() slot should be used for overridable user-specific initializations. Take for example the dictionary type. The implementation has an internal pointer to a hash table which should never be NULL. This invariant is taken care of by the tp_new() slot for dictionaries. The dictionary tp_init() slot, on the other hand, could be used to give the dictionary an initial set of keys and values based on the arguments passed in.
...
您可能想知道为什么tp_new()插槽不应该调用tp_init()插槽本身.原因是在某些情况下(例如对持久对象的支持),重要的是能够创建特定类型的对象,而不必对它进行任何不必要的初始化.这可以通过调用tp_new()插槽而不调用tp_init()来方便地完成.还可能没有调用tp_init()或多次调用hat -即使在这些异常情况下,其操作也应可靠.
You may wonder why the tp_new() slot shouldn't call the tp_init() slot itself. The reason is that in certain circumstances (like support for persistent objects), it is important to be able to create an object of a particular type without initializing it any further than necessary. This may conveniently be done by calling the tp_new() slot without calling tp_init(). It is also possible hat tp_init() is not called, or called more than once -- its operation should be robust even in these anomalous cases.
C ++包装器的全部要点是使您能够编写出色的C ++代码.例如,假设您希望对象具有只能在构造期间初始化的数据成员.如果在tp_new
期间创建对象,则无法在tp_init
期间重新初始化该数据成员.这可能会迫使您通过某种智能指针来持有该数据成员,并在tp_new
期间创建它.这使代码很难看.
The entire point of a C++ wrapper is to enable you to write nice C++ code. Say for example that you want your object to have a data member that can only be initialized during its construction. If you create the object during tp_new
, then you cannot reinitialize that data member during tp_init
. This will probably force you to hold that data member via some kind of a smart pointer and create it during tp_new
. This makes the code ugly.
PyCXX采取的方法是将对象构造分为两部分:
The approach PyCXX takes is to separate object construction into two:
-
tp_new
创建一个虚拟对象,只带有一个指向创建为tp_init
的C ++对象的指针.该指针最初为空.
tp_new
creates a dummy object with just a pointer to the C++ object which is createdtp_init
. This pointer is initially null.
tp_init
分配并构造实际的C ++对象,然后更新在tp_new
中创建的虚拟对象中的指针以指向它.如果tp_init
被多次调用,则会引发Python异常.
tp_init
allocates and constructs the actual C++ object, then updates the pointer in the dummy object created in tp_new
to point to it. If tp_init
is called more than once it raises a Python exception.
我个人认为,这种方法对我自己的应用程序的开销过高,但这是合法的方法.我在Python C/API周围有自己的C ++包装器,该包装器在tp_new
中进行了所有初始化,这也是有缺陷的.似乎没有一个好的解决方案.
I personally think that the overhead of this approach for my own applications is too high, but it's a legitimate approach. I have my own C++ wrapper around the Python C/API that does all the initialization in tp_new
, which is also flawed. There doesn't appear to be a good solution for that.
这篇关于如何整理/修复PyCXX创建的新型Python扩展类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!