在集群模式下与Spark-Submit共享配置文件 [英] Share config files with spark-submit in cluster mode

查看：0 发布时间：2022/8/8 17:34:31 apache-spark spark-streaming hadoop-yarn

本文介绍了在集群模式下与Spark-Submit共享配置文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在开发期间一直在"客户端"模式下运行我的Spark作业。我使用"--file"与执行器共享配置文件。驱动程序正在本地读取配置文件。现在，我想在"集群"模式下部署作业。我现在无法与驱动程序共享配置文件。

例如，我将配置文件名作为Extra Java Options传递给驱动程序和执行器。我正在使用SparkFiles.get()读取文件

  val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))

这在执行器上运行良好，但在驱动程序上失败。我认为这些文件只与执行器共享，而不是与运行驱动程序的容器共享。一种选择是将配置文件保留在S3中。我想检查一下是否可以使用Spark-Submit实现这一点。

> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g 
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties 
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" 
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" 
> --class ....

推荐答案

我在this线程中找到了此问题的解决方案。

您可以通过在文件末尾添加‘#alias’来为您提交的文件指定别名。通过此技巧，您应该能够通过其别名访问文件。

例如，以下代码可以正确运行。

spark-submit --master yarn-cluster --files test.conf#testFile.conf test.py

将test.py设置为：

path_f = 'testFile.conf'
try:
    f = open(path_f, 'r')
except:
    raise Exception('File not opened', 'EEEEEEE!')

和空的test.conf

这篇关于在集群模式下与Spark-Submit共享配置文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在集群模式下与Spark-Submit共享配置文件 [英] Share config files with spark-submit in cluster mode

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在集群模式下与Spark-Submit共享配置文件 [英] Share config files with spark-submit in cluster mode

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭