在两个python进程或python进程的不同内存中安全写入cython c包装器中的变量 [英] Safe writing to variable in cython c wrapper within two python processes or distinct memory for python processes

查看:72
本文介绍了在两个python进程或python进程的不同内存中安全写入cython c包装器中的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个基于c库的包装程序,该包装程序可以接收一些财务数据,并且希望将其收集为python数据类型(具有字段名列表的字典和具有财务数据字段的列表列表).

I am creating a wrapper over c library that recieves some financial data and I want to collect it into python data type (dict with list of field names and list of lists with financial data fields).

在c级别上,有一些函数开始侦听"某些端口,并且当出现任何事件时,将调用某些用户定义的函数.此功能用cython编写.这种功能的简化示例在这里:

On the c level there is function that starts "listening" to some port and when any event appears some user-defined function is called. This function is written in cython. Simplified example of such function is here:

cdef void default_listener(const event_data_t* data, int data_count, void* user_data):

    cdef trade_t* trades = <trade_t*>data # cast recieved data according to expected type 
    cdef dict py_data = <dict>user_data # cast user_data to initial type(dict in our case)

    for i in range(data_count):
        # append to list in the dict that we passed to the function 
        # fields of recieved struct
        py_data['data'].append([trades[i].price,
                                trades[i].size,
                                ]
                               )

问题:当只有一个python进程启动时,没有问题,但是如果我启动另一个python进程并运行相同的函数,则其中一个进程将在不确定的时间内终止.我认为发生这种情况是因为在不同进程中同时调用的两个函数可能会尝试写入内存的同一部分.可能是这样吗?

The problem: when there is only one python process with this function started, there are no problems, but if I start another python process and run the same function one of the processes will be terminated in undetermiined amount of time. I suppose that this happens because two functions that are called simultaniously in different processes may try to write to the same part of the memory. May this be the case?

如果是这种情况,是否有任何方法可以防止两个进程使用相同的内存?还是可以在cython代码开始写入之前建立一些锁定?

If this is the case, are there any ways to prevent two processes use the same memory? Or maybe some lock can be established before the cython code starts to write?

PS:我还阅读了本文并根据为每个python进程分配了一些不与其他进程的各个部分相交的内存.但是对于我来说还不清楚,这个分配的内存是否也可用于基础c函数,或者这些函数是否已访问可能相交的其他字段

P.S.: I also have read this article and according to it for each python process there is some memory allocated that does not intersect with parts for other processes. But it is unclear for me, is this allocated memory also available for underlying c functions or these functions have acces to another fields that may intersect

推荐答案

我正在根据您的评论猜测答案-如果不正确,则将其删除,但我认为这很可能是正确的值得发布作为答案.

I'm taking a guess at the answer based on your comment - if it's wrong then I'll delete it, but I think it's likely enough to be right to be worth posting as an answer.

Python具有一种称为全局解释器锁定"(或GIL)的锁定机制.这样可以确保多个线程不会尝试同时访问同一内存(包括Python内部的内存,这对用户来说可能并不明显).

Python has a locking mechanism known as the Global Interpreter Lock (or GIL). This ensures that multiple threads don't attempt to access the same memory simultaneously (including memory internal to Python, that may not be obvious to the user).

您的Cython代码将在其线程包含GIL的假设下工作.我强烈怀疑这不是真的,因此对Python对象执行 any 操作可能会导致崩溃.解决此问题的一种方法是遵循C代码中调用Cython代码的文档的这一部分.但是,我怀疑使用Cython会更容易.

Your Cython code will be working on the assumption that its thread holds the GIL. I strongly suspect that this isn't true, and so performing any operations on a Python object will likely cause a crash. One way to deal with this would be to follow this section of documentation in the C code that calls the Cython code. However, I suspect it's easier to handle in Cython.

首先告诉Cython该函数是" nogil "-它不需要GIL:

First tell Cython that the function is "nogil" - it does not require the GIL:

cdef void default_listener(const event_data_t* data, int data_count, void* user_data) nogil:

如果您现在尝试编译,它将失败,因为您在函数中使用了Python类型.要解决此问题,请在您的Cython代码中声明GIL.

If you try to compile now it will fail, since you use Python types within the function. To fix this, claim the GIL within your Cython code.

cdef void default_listener(...) nogil:
    with gil:
        default_listener_impl(...)

我所做的是将实现放在一个单独的函数中,该函数需要GIL(即没有附加 nogil ).这样做的原因是您不能将 cdef 语句放在 with gil 部分中(正如您在评论中所说)-它们必须在它外面.但是,您不能在其中放置 cdef dict ,因为它是Python对象.因此,单独的功能是最简单的解决方案.单独的功能几乎与 default_listener 现在的功能完全一样.

What I've done is put the implementation in a separate function that does require the GIL (i.e. doesn't have a nogil attached). The reason for this is that you can't put cdef statements in the with gil section (as you say in your comment) - they have to be outside it. However, you can't put cdef dict outside it, because it's a Python object. Therefore a separate function is the easiest solution. The separate function looks almost exactly like default_listener does now.

值得一提的是,这并不是一个完整的锁定机制-它实际上只是为了保护Python内部免受损坏-普通的Python线程会定期自动释放并重新获得GIL,并且可能是在您在操作期间".Cython不会释放GIL,除非您告知它(在这种情况下,在 with gil:块的末尾),因此在此期间它会持有排他锁.如果您需要更好的锁定控制,则可以查看 multithreading 库,或包装一些C锁定库.

It's worth knowing that this isn't a complete locking mechanism - it's really only to protect the Python internals from being corrupted - an ordinary Python thread will release and regain the GIL periodically automatically, and that may be while you're "during" an operation. Cython won't release the GIL unless you tell it to (in this case, at the end of the with gil: block) so does hold an exclusive lock during this time. If you need finer control of locking then you may want to look at either the multithreading library, or wrapping some C locking library.

这篇关于在两个python进程或python进程的不同内存中安全写入cython c包装器中的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆