增加Spark工人核心 [英] Increase the Spark workers cores

查看:99
本文介绍了增加Spark工人核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在master和2个worker上安装了Spark.每个工作人员的原始核心编号是8.启动主服务器时,工作人员可以正常工作,没有任何问题,但是问题是在Spark GUI中,每个工作人员仅分配了2个内核.

I have installed Spark on master and 2 workers. The original core number per worker is 8. When I start the master, the workers are work properly without any problem, but the problem is in Spark GUI each worker has only 2 cores assigned.

请问,如何增加每个工作人员使用8个内核的内核数量?

Kindly, how can I increase the number of the cores in which each worker works with 8 cores?

推荐答案

控制每个执行者核心的设置是 spark.executor.cores .请参阅文档.可以通过 spark-submit cmd参数或在 spark-defaults.conf 中进行设置.该文件通常位于/etc/spark/conf (ymmv)中.您可以使用 find/-type f -name spark-defaults.conf

The setting which controls cores per executor is spark.executor.cores. See doc. It can be set either via spark-submit cmd argument or in spark-defaults.conf. The file is usually located in /etc/spark/conf (ymmv). YOu can search for the conf file with find / -type f -name spark-defaults.conf

spark.executor.cores 8

但是,该设置不能保证每个执行者将始终获得所有可用的内核.这取决于您的工作量.

However the setting does not guarantee that each executor will always get all the available cores. This depends on your workload.

如果您在数据帧或rdd上安排任务,spark将为该数据帧的每个分区运行一个并行任务.任务将安排给执行者(单独的jvm),执行者可以在每个内核的jvm线程中并行运行多个任务.

If you schedule tasks on a dataframe or rdd, spark will run a parallel task for each partition of the dataframe. A task will be scheduled to an executor (separate jvm) and the executor can run multiple tasks in parallel in jvm threads on each core.

此外,执行者不一定必须在单独的工作人员上运行.如果有足够的内存,则两个执行程序可以共享一个工作程序节点.

Also an exeucutor will not necessarily run on a separate worker. If there is enough memory, 2 executors can share a worker node.

要使用所有核心,您的情况下的设置应如下所示:

In order to use all the cores the setup in your case could look as follows:

假设每个节点上有10 GB的内存

given you have 10 gig of memory on each node

spark.default.parallelism 14
spark.executor.instances 2
spark.executor.cores 7
spark.executor.memory 9g

将内存设置为9g可以确保将每个执行程序分配给一个单独的节点.每个执行器将具有7个可用核心.每个数据帧操作将被调度到14个并发任务,这些任务将x 7分配给每个执行器.您也可以重新分区数据框,而不用设置 default.parallelism .操作系统只剩下一个核心和1gig的内存.

Setting memory to 9g will make sure, each executor is assigned to a separate node. Each executor will have 7 cores available. And each dataframe operation will be scheduled to 14 concurrent tasks, which will be distributed x 7 to each executor. You can also repartition a dataframe, instead of setting default.parallelism. One core and 1gig of memory is left for the operating system.

这篇关于增加Spark工人核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆