长时间运行的员工阻止了GIL超时错误 [英] Long running workers blocking GIL timeout errors

查看:52
本文介绍了长时间运行的员工阻止了GIL超时错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在dask.delayed工作负载上使用了dask-distributed和本地设置(LocalCluster,有5个工作程序).大多数工作是由vtk Python绑定完成的.因为vtk是基于C ++的,所以我认为这意味着在长时间运行的语句中,工作人员不会释放GIL.当我运行工作负载时,我的终端会打印出一系列错误,如下所示:

I'm using dask-distributed with a local setup (LocalCluster with 5 workers) on a dask.delayed workload. Most of the work is done by the vtk Python bindings. Since vtk is C++ based I think that means the workers don't release the GIL when in a long-running statement. When I run the workload, my terminal prints out a bunch of errors like this:

Traceback (most recent call last):
  File "C:\Users\patri\AppData\Local\Continuum\anaconda3\lib\site-packages\distributed\comm\core.py", line 221, in connect
    _raise(error)
  File "C:\Users\patri\AppData\Local\Continuum\anaconda3\lib\site-packages\distributed\comm\core.py", line 204, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:49721' after 10 s: connect() didn't finish in time

但是,我的工作量仍然很好-在命令行上出现了很多错误,但它一直在困扰.所以我认为工作人员并没有崩溃,但是心跳停止了.由于我不想弄乱vtk内部组件来发布GIL,因此我该如何解决错误?这些良性超时错误很多,以至于我看不到任何可能发生的实际错误.

My workload continues fine however - I get a bunch of errors on the command line but it keeps chugging along. So I think the workers aren't crashing, but the heartbeat communication stops. Since I don't want to mess with vtk internals to release the GIL, how can I fix the errors? I get so many of these benign timeout errors that I can't see any real errors that might happen.

推荐答案

通过休眠VTK事件循环线程暂时释放GIL.如果您使用的是 vtkWindowRendererInteractor 实例,请创建一个带有回调的计时器,该回调使用内置的 sleep 来使执行稍稍休眠.

Release the GIL temporally by sleeping the VTK event loop thread. If you are using a vtkWindowRendererInteractor instance, create a timer with a callback which sleeps the execution a bit using the sleep builtin.

这篇关于长时间运行的员工阻止了GIL超时错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆