Spark中每个任务的CPU数量 [英] Number of CPUs per Task in Spark

查看:920
本文介绍了Spark中每个任务的CPU数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不太了解spark.task.cpus参数.在我看来,任务"对应于执行器中的线程"或进程".假设我将"spark.task.cpus"设置为2.

I don't quite understand spark.task.cpus parameter. It seems to me that a "task" corresponds to a "thread" or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.

  1. 一个线程如何同时利用两个CPU?它不需要锁并导致同步问题吗?

  1. How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?

我正在查看deploy/executor/Executor.scala中的launchTask()函数,在这里没有看到每个任务的cpus数"的任何概念.那么Spark在独立模式下最终将在何处/如何为一个任务分配多个cpu?

I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?

推荐答案

据我所知,在已知某些特定任务具有自己内部(自定义)的情况下,spark.task.cpus控制群集中任务的并行性)并行性.

To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.

详细信息: 我们知道spark.cores.max定义您的应用程序需要多少个线程(又名核心).如果您离开spark.task.cpus = 1,那么您将同时运行#spark.cores.max个并发Spark任务.

In more detail: We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.

仅当您知道自己的任务已并行化时才想更改spark.task.cpus(也许每个任务都产生两个线程,与外部工具进行交互等).通过相应地设置spark.task.cpus,您将成为一个好人. 公民".现在,如果您有spark.cores.max = 10和spark.task.cpus = 2,Spark只会创建10/2 = 5个并发任务.假设您的任务需要(例如)两个内部线程,则执行线程的总数永远不会超过10.这意味着您永远不会超出初始合同(由spark.cores.max定义).

You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).

这篇关于Spark中每个任务的CPU数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆