增加Spark工人核心 [英] Increase the Spark workers cores

查看：99 发布时间：2021/4/8 20:00:35 apache-spark

本文介绍了增加Spark工人核心的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经在master和2个worker上安装了Spark.每个工作人员的原始核心编号是8.启动主服务器时，工作人员可以正常工作，没有任何问题，但是问题是在Spark GUI中，每个工作人员仅分配了2个内核.

I have installed Spark on master and 2 workers. The original core number per worker is 8. When I start the master, the workers are work properly without any problem, but the problem is in Spark GUI each worker has only 2 cores assigned.

请问，如何增加每个工作人员使用8个内核的内核数量?

Kindly, how can I increase the number of the cores in which each worker works with 8 cores?

推荐答案

控制每个执行者核心的设置是 spark.executor.cores .请参阅文档.可以通过 spark-submit cmd参数或在 spark-defaults.conf 中进行设置.该文件通常位于/etc/spark/conf (ymmv)中.您可以使用 find/-type f -name spark-defaults.conf

The setting which controls cores per executor is spark.executor.cores. See doc. It can be set either via spark-submit cmd argument or in spark-defaults.conf. The file is usually located in /etc/spark/conf (ymmv). YOu can search for the conf file with find / -type f -name spark-defaults.conf

spark.executor.cores 8

但是，该设置不能保证每个执行者将始终获得所有可用的内核.这取决于您的工作量.

However the setting does not guarantee that each executor will always get all the available cores. This depends on your workload.

如果您在数据帧或rdd上安排任务，spark将为该数据帧的每个分区运行一个并行任务.任务将安排给执行者(单独的jvm)，执行者可以在每个内核的jvm线程中并行运行多个任务.

If you schedule tasks on a dataframe or rdd, spark will run a parallel task for each partition of the dataframe. A task will be scheduled to an executor (separate jvm) and the executor can run multiple tasks in parallel in jvm threads on each core.

此外，执行者不一定必须在单独的工作人员上运行.如果有足够的内存，则两个执行程序可以共享一个工作程序节点.

Also an exeucutor will not necessarily run on a separate worker. If there is enough memory, 2 executors can share a worker node.

要使用所有核心，您的情况下的设置应如下所示:

In order to use all the cores the setup in your case could look as follows:

假设每个节点上有10 GB的内存

given you have 10 gig of memory on each node

spark.default.parallelism 14
spark.executor.instances 2
spark.executor.cores 7
spark.executor.memory 9g

将内存设置为9g可以确保将每个执行程序分配给一个单独的节点.每个执行器将具有7个可用核心.每个数据帧操作将被调度到14个并发任务，这些任务将x 7分配给每个执行器.您也可以重新分区数据框，而不用设置 default.parallelism .操作系统只剩下一个核心和1gig的内存.

Setting memory to 9g will make sure, each executor is assigned to a separate node. Each executor will have 7 cores available. And each dataframe operation will be scheduled to 14 concurrent tasks, which will be distributed x 7 to each executor. You can also repartition a dataframe, instead of setting default.parallelism. One core and 1gig of memory is left for the operating system.

这篇关于增加Spark工人核心的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

增加Spark工人核心 [英] Increase the Spark workers cores

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

增加Spark工人核心 [英] Increase the Spark workers cores

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭