Spark执行程序上的并发任务 [英] Concurrent tasks on a Spark executor

查看:71
本文介绍了Spark执行程序上的并发任务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由什么决定可以在Spark执行器上同时运行多少个任务?也许这是某种线程池和共享内存资源?

What determines how many tasks can run concurrently on a Spark executor? Maybe it is some kind of thread pool and shared memory resources?

哪些参数可以控制该行为?

What parameters control that behavior?

这是否意味着应该始终将执行程序中使用的代码编写为线程安全的?

Does it mean that code used in executors should always be written thread-safe?

推荐答案

由什么决定可以在Spark执行器上同时运行多少个任务?

What determines how many tasks can run concurrently on a Spark executor?

Spark将特定执行器上的任务数映射到分配给它的内核数.默认情况下,Spark将一个内核分配给由 spark.task.cpus 参数控制的任务,该参数默认为1.

Spark maps the number tasks on a particular Executor to the number of cores allocated to it. By default, Spark assigns one core to a task which is controlled by the spark.task.cpus parameter which defaults to 1.

这是否意味着执行程序中使用的代码应始终编写为线程安全的?

Does it mean that code used in executors should always be written thread-safe?

不.通常,使用RDD或DataFrame/Set的目的是使您在转换内部完成工作,而无需共享全局资源.当您拥有一个可以在单个执行程序进程中并行执行的全局资源时,您应该考虑线程安全性,当在同一个执行程序上执行多个任务时,可能会发生线程安全性.

No. Generally working with RDDs or DataFrame/Set is aimed so you do work locally inside a transform, without sharing global resources. You should think about thread-safety when you have a global resource which would execute in parallel inside a single Executor process, which can happen when multiple tasks are executed on the same Executor.

这篇关于Spark执行程序上的并发任务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆