如何为严格的防火墙配置Apache Spark随机工作器端口? [英] How to configure Apache Spark random worker ports for tight firewalls?

查看：450 发布时间：2020/5/29 1:32:13 configuration apache-spark worker ports

本文介绍了如何为严格的防火墙配置Apache Spark随机工作器端口?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Apache Spark来运行机器学习算法和其他大数据任务.以前，我是使用Spark Cluster独立模式在同一台机器上运行spark master和worker.现在，我添加了多台工作计算机，并且由于防火墙太紧，我不得不编辑工作计算机的随机端口.谁能帮助您更改随机的火花端口，并准确告诉我需要编辑哪些配置文件?我阅读了spark文档，并说应该配置spark-defaults.conf，但我不知道如何配置此文件以特别更改spark的随机端口.

解决方案

更新，适用于Spark 2.x

一些库已经从头开始重写，许多旧的*.port属性现在已经过时了(参见SPARK-10997/SPARK-20605/SPARK-12588/SPARK-17678等)

例如，对于Spark 2.1，驱动程序将侦听执行程序通信的端口范围是

在spark.driver.port和spark.driver.port + spark.port.maxRetries
在spark.driver.blockManager.port和spark.driver.blockManager.port + spark.port.maxRetries

执行程序将侦听驱动程序流量和/或其他执行程序流量的端口范围是

在spark.blockManager.port和spark.blockManager.port + spark.port.maxRetries

"maxRetries"属性允许并行运行多个Spark作业；如果已经使用了基本端口，则除非已使用整个范围，否则新作业将尝试下一个，依此类推.

来源:
https://spark.apache.org/docs /2.1.1/configuration.html#networking
https://spark.apache.org/docs/2.1 .1/security.html 在"配置端口"

下

I am using Apache Spark to run machine learning algorithms and other big data tasks. Previously, I was using spark cluster standalone mode running spark master and worker on the same machine. Now, I added multiple worker machines and due to a tight firewall, I have to edit the random port of worker. Can anyone help how to change random spark ports and tell me exactly what configuration file needs to be edited? I read the spark documentation and it says spark-defaults.conf should be configured but I don't know how I can configure this file for particularly changing random ports of spark.

解决方案

Update for Spark 2.x

Some libraries have been rewritten from scratch and many legacy *.port properties are now obsolete (cf. SPARK-10997 / SPARK-20605 / SPARK-12588 / SPARK-17678 / etc)

For Spark 2.1, for instance, the port ranges on which the driver will listen for executor traffic are

between spark.driver.port and spark.driver.port+spark.port.maxRetries
between spark.driver.blockManager.port and spark.driver.blockManager.port+spark.port.maxRetries

And the port range on which the executors will listen for driver traffic and/or other executors traffic is

between spark.blockManager.port and spark.blockManager.port+spark.port.maxRetries

The "maxRetries" property allows for running several Spark jobs in parallel; if the base port is already used, then the new job will try the next one, etc, unless the whole range is already used.

Source:
https://spark.apache.org/docs/2.1.1/configuration.html#networking
https://spark.apache.org/docs/2.1.1/security.html under "Configuring ports"

这篇关于如何为严格的防火墙配置Apache Spark随机工作器端口?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何为严格的防火墙配置Apache Spark随机工作器端口? [英] How to configure Apache Spark random worker ports for tight firewalls?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何为严格的防火墙配置Apache Spark随机工作器端口? [英] How to configure Apache Spark random worker ports for tight firewalls?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭