Spark如何将切片与任务/执行者/工作人员并行化? [英] How does Spark paralellize slices to tasks/executors/workers?

查看：139 发布时间：2020/9/4 2:56:56 apache-spark

本文介绍了Spark如何将切片与任务/执行者/工作人员并行化?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个2节点的Spark集群，每个节点具有4个核心.

I have a 2-node Spark cluster with 4 cores per node.

        MASTER
(Worker-on-master)              (Worker-on-node1)

火花配置:

从站:主节点，node1
SPARK_WORKER_INSTANCES = 1

我正在尝试了解Spark的paralellize行为. sparkPi示例包含以下代码:

I am trying to understand Spark's paralellize behaviour. The sparkPi example has this code:

val slices = 8  // my test value for slices
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
  val x = random * 2 - 1
  val y = random * 2 - 1
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)

根据文档:

Spark将为群集的每个切片运行一个任务.通常，群集中的每个CPU都需要2-4个切片.

Spark will run one task for each slice of the cluster. Typically you want 2-4 slices for each CPU in your cluster.

我将slice设置为8，这意味着工作集将在集群上的8个任务之间划分，每个工作节点依次获得4个任务(每个核心1:1)

I set slices to be 8 which means the working set will be divided among 8 tasks on the cluster, in turn each worker node gets 4 tasks (1:1 per core)

问题:

我在哪里可以看到任务级别的详细信息?在执行程序内部，我看不到任务分解，因此可以看到切片对UI的影响.

Where can I see task level details? Inside executors I don't see task breakdown so I can see the effect of slices on the UI.

如何以编程方式找到上述map函数的工作集大小?我假设它是n/slices(高于100000)

How to programmatically find the working set size for the map function above? I assume it is n/slices (100000 above)

执行程序运行的多个任务是按顺序运行还是并行运行在多个线程中?

Are the multiple tasks run by an executor run sequentially or paralell in multiple threads?

每个CPU落后2-4个切片.

Reasoning behind 2-4 slices per CPU.

我认为理想情况下，我们应该调整SPARK_WORKER_INSTANCES使其对应于每个节点(在同构集群中)的核心数，以便每个核心都有自己的执行器和任务(1:1:1)

I assume ideally we should tune SPARK_WORKER_INSTANCES to correspond to number of cores in each node (in a homogeneous cluster) so that each core gets its own executor and task (1:1:1)

Spark如何将切片与任务/执行者/工作人员并行化? [英] How does Spark paralellize slices to tasks/executors/workers?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark如何将切片与任务/执行者/工作人员并行化? [英] How does Spark paralellize slices to tasks/executors/workers?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭