如何为严密的防火墙配置 Apache Spark 随机工作端口? [英] How to configure Apache Spark random worker ports for tight firewalls?
问题描述
我正在使用 Apache Spark 运行机器学习算法和其他大数据任务.以前,我使用 spark 集群独立模式在同一台机器上运行 spark master 和 worker.现在,我添加了多台工作机器,由于防火墙严密,我必须编辑工作人员的随机端口.任何人都可以帮助如何更改随机火花端口并确切地告诉我需要编辑哪些配置文件?我阅读了 spark 文档,它说应该配置 spark-defaults.conf
但我不知道如何配置此文件以特别更改 spark 的随机端口.
I am using Apache Spark to run machine learning algorithms and other big data tasks. Previously, I was using spark cluster standalone mode running spark master and worker on the same machine. Now, I added multiple worker machines and due to a tight firewall, I have to edit the random port of worker. Can anyone help how to change random spark ports and tell me exactly what configuration file needs to be edited? I read the spark documentation and it says spark-defaults.conf
should be configured but I don't know how I can configure this file for particularly changing random ports of spark.
推荐答案
更新 for Spark 2.x
Update for Spark 2.x
一些库已经从头开始重写,许多旧的 *.port
属性现在已经过时(参见 SPARK-10997/SPARK-20605/SPARK-12588/SPARK-17678/etc)
Some libraries have been rewritten from scratch and many legacy *.port
properties are now obsolete (cf. SPARK-10997 / SPARK-20605 / SPARK-12588 / SPARK-17678 / etc)
例如,对于 Spark 2.1,驱动程序侦听执行器流量的端口范围是
For Spark 2.1, for instance, the port ranges on which the driver will listen for executor traffic are
- 介于
spark.driver.port
和spark.driver.port
+spark.port.maxRetries
- 介于
spark.driver.blockManager.port
和spark.driver.blockManager.port
+spark.port.maxRetries
之间
- between
spark.driver.port
andspark.driver.port
+spark.port.maxRetries
- between
spark.driver.blockManager.port
andspark.driver.blockManager.port
+spark.port.maxRetries
执行器将侦听驱动程序流量和/或其他执行器流量的端口范围是
And the port range on which the executors will listen for driver traffic and/or other executors traffic is
- 介于
spark.blockManager.port
和spark.blockManager.port
+spark.port.maxRetries
maxRetries"属性允许并行运行多个 Spark 作业;如果基本端口已被使用,则新作业将尝试下一个,依此类推,除非整个范围已被使用.
The "maxRetries" property allows for running several Spark jobs in parallel; if the base port is already used, then the new job will try the next one, etc, unless the whole range is already used.
来源:
https://spark.apache.org/docs/2.1.1/configuration.html#networking
https://spark.apache.org/docs/2.1配置端口"
这篇关于如何为严密的防火墙配置 Apache Spark 随机工作端口?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!