如何为默认dask调度程序指定线程/进程数 [英] How to specify the number of threads/processes for the default dask scheduler
问题描述
有没有办法限制默认线程调度程序使用的内核数量(使用dask数据帧时是默认值)?
Is there a way to limit the number of cores used by the default threaded scheduler (default when using dask dataframes)?
使用 compute
,您可以使用以下命令进行指定:
With compute
, you can specify it by using:
df.compute(get=dask.threaded.get, num_workers=20)
但是我想知道是否可以将其设置为默认值,因此,您不必为每个计算
调用指定此代码?
But I was wondering if there is a way to set this as the default, so you don't need to specify this for each compute
call?
在这种情况下,这可能会很有趣一个小型集群(例如64个内核),但是却与其他人共享(没有工作系统),因此我不想在使用dask开始计算时占用所有内核。
The would eg be interesting in the case of a small cluster (eg of 64 cores), but which is shared with other people (without a job system), and I don't want to necessarily take up all cores when starting computations with dask.
推荐答案
您可以从多处理中指定默认的ThreadPool
You can specify a default ThreadPool
from multiprocessing.pool import ThreadPool
import dask
dask.config.set(pool=ThreadPool(20))
这篇关于如何为默认dask调度程序指定线程/进程数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!