我们如何在dask分布式中为每个工作人员选择--nthreads和--nprocs? [英] how do we choose --nthreads and --nprocs per worker in dask distributed?

查看:163
本文介绍了我们如何在dask分布式中为每个工作人员选择--nthreads和--nprocs?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Dask分布中,我们如何为每个工人选择--nthreads--nprocs?我有3个工作线程,每个工作线程有2个线程,每个内核有4个内核,每个内核有1个线程(根据每个工作线程上lscpu Linux命令的输出).

How do we choose --nthreads and --nprocs per worker in Dask distributed? I have 3 workers, with 4 cores and one thread per core on 2 workers and 8 cores on 1 worker (according to the output of lscpu Linux command on each worker).

推荐答案

这取决于您的工作量

默认情况下,Dask创建的单个进程的线程数与计算机上具有逻辑核心的线程数相同(由multiprocessing.cpu_count()确定).

By default Dask creates a single process with as many threads as you have logical cores on your machine (as determined by multiprocessing.cpu_count()).

dask-worker ... --nprocs 1 --nthreads 8  # assuming you have eight cores
dask-worker ...                          # this is actually the default setting

如果您主要处理数字工作负载(例如Numpy,Pandas和Scikit-Learn代码中常见的工作负载),则很少使用进程,每个进程使用多个线程是很好的,这不受Python全局变量的影响.解释器锁定(GIL).

Using few processes and many threads per process is good if you are doing mostly numeric workloads, such as are common in Numpy, Pandas, and Scikit-Learn code, which is not affected by Python's Global Interpreter Lock (GIL).

但是,如果您花费大量的计算时间来处理诸如字符串或字典之类的Pure Python对象,那么您可能希望通过拥有更多进程且每个线程更少的线程来避免GIL问题

However, if you are spending most of your compute time manipulating Pure Python objects like strings or dictionaries then you may want to avoid GIL issues by having more processes with fewer threads each

dask-worker ... --nprocs 8 --nthreads 1

基于基准测试,您可能会发现更均衡的分配效果更好

Based on benchmarking you may find that a more balanced split is better

dask-worker ... --nprocs 4 --nthreads 2

使用更多的流程可以避免GIL问题,但是由于流程间的通信会增加成本.如果您的计算需要大量的跨员工沟通,那么您将希望避免很多流程.

Using more processes avoids GIL issues, but adds costs due to inter-process communication. You would want to avoid many processes if your computations require a lot of inter-worker communication..

这篇关于我们如何在dask分布式中为每个工作人员选择--nthreads和--nprocs?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆