全局解释器锁定和数据访问(例如,用于NumPy数组) [英] Global Interpreter Lock and access to data (eg. for NumPy arrays)
问题描述
我正在编写Python的C扩展,该扩展应在对数据进行操作时释放Global Interpreter Lock.我想我对GIL的机制相当了解,但是仍然存在一个问题:当线程不拥有GIL时,我可以访问Python对象中的数据吗?例如,我想从C函数中的(大)NumPy数组中读取数据,而我仍然希望允许其他线程在其他CPU内核上执行其他操作. C函数应该
I am writing a C extension for Python, which should release the Global Interpreter Lock while it operates on data. I think I have understood the mechanism of the GIL fairly well, but one question remains: Can I access data in a Python object while the thread does not own the GIL? For example, I want to read data from a (big) NumPy array in the C function while I still want to allow other threads to do other things on the other CPU cores. The C function should
- 使用
Py_BEGIN_ALLOW_THREADS
释放GIL
- 无需使用Python函数即可读取和处理数据
- 甚至将数据写入先前构造的NumPy数组
- 通过
Py_END_ALLOW_THREADS
获取GIL
- release the GIL with
Py_BEGIN_ALLOW_THREADS
- read and work on the data without using Python functions
- even write data to previously constructed NumPy arrays
- reacquire the GIL with
Py_END_ALLOW_THREADS
这样安全吗?当然,其他线程不应更改C函数使用的变量.但是也许有一个隐藏的错误源:Python解释器可以移动对象,例如.通过某种垃圾回收,而C函数在单独的线程上对其进行处理?
Is this safe? Of course, other threads are not supposed to change the variables which the C function uses. But maybe there is one hidden source for errors: could the Python interpreter move an object, eg. by some sort of garbage collection, while the C function works on it in a separate thread?
为了用一个最小的例子说明这个问题,请考虑下面的(最小但完整的)代码. (在Linux上)使用
To illustrate the question with a minimal example, consider the (minimal but complete) code below. Compile it (on Linux) with
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -fPIC -I/usr/lib/pymodules/python2.7/numpy/core/include -I/usr/include/python2.7 -c gilexample.c -o gilexample.o
gcc -pthread -shared gilexample.o -o gilexample.so
并在Python中使用
and test it in Python with
import gilexample
gilexample.sum([1,2,3])
Py_BEGIN_ALLOW_THREADS
和Py_END_ALLOW_THREADS
之间的代码安全吗?它访问Python对象的内容,并且我不想在内存中复制(可能很大)数组.
Is the code between Py_BEGIN_ALLOW_THREADS
and Py_END_ALLOW_THREADS
safe? It accesses the contents of a Python object, and I do not want to duplicate the (possibly large) array in memory.
#include <Python.h>
#include <numpy/arrayobject.h>
// The relevant function
static PyObject * sum(PyObject * const self, PyObject * const args) {
PyObject * X;
PyArg_ParseTuple(args, "O", &X);
PyObject const * const X_double = PyArray_FROM_OTF(X, NPY_DOUBLE, NPY_ALIGNED);
npy_intp const size = PyArray_SIZE(X_double);
double * const data = (double *) PyArray_DATA(X_double);
double sum = 0;
Py_BEGIN_ALLOW_THREADS // IS THIS SAFE?
npy_intp i;
for (i=0; i<size; i++)
sum += data[i];
Py_END_ALLOW_THREADS
Py_DECREF(X_double);
return PyFloat_FromDouble(sum);
}
// Python interface code
// List the C methods that this extension provides.
static PyMethodDef gilexampleMethods[] = {
{"sum", sum, METH_VARARGS},
{NULL, NULL, 0, NULL} /* Sentinel - marks the end of this structure */
};
// Tell Python about these methods.
PyMODINIT_FUNC initgilexample(void) {
(void) Py_InitModule("gilexample", gilexampleMethods);
import_array(); // Must be present for NumPy.
}
推荐答案
这样安全吗?
Is this safe?
严格地,不.我认为您应该将调用移至无GIL块之外的 PyArray_SIZE
和PyArray_DATA
处;如果这样做,将仅对C数据进行操作.您可能还想在进入无GIL的块之前增加对象的引用计数,然后再减少它.
Strictly, no. I think you should move the calls to PyArray_SIZE
and PyArray_DATA
outside the GIL-less block; if you do that, you'll be operating on C data only. You might also want to increment the reference count on the object before going into the GIL-less block and decrement it afterwards.
编辑后,它应该是安全的.不要忘了以后减少引用计数.
After your edits, it should be safe. Don't forget to decrement the reference count afterwards.
这篇关于全局解释器锁定和数据访问(例如,用于NumPy数组)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!