将系统属性传递给spark-submit并从类路径或自定义路径读取文件 [英] Pass system property to spark-submit and read file from classpath or custom path
问题描述
我最近在Apache Spark中找到了 一种使用logback代替log4j的方法 (都供本地使用和spark-submit
).但是,最后一块丢失了.
I have recently found a way to use logback instead of log4j in Apache Spark (both for local use and spark-submit
). However, there is last piece missing.
问题是Spark非常努力地在其类路径中看不到logback.xml
设置.我已经找到了一种在本地执行期间加载它的方法:
The issue is that Spark tries very hard not to see logback.xml
settings in its classpath. I have already found a way to load it during local execution:
基本上,检查系统属性logback.configurationFile
,但是从我的/src/main/resources/
中加载logback.xml
是为了防万一:
Basically, checking for System property logback.configurationFile
, but loading logback.xml
from my /src/main/resources/
just in case:
// the same as default: https://logback.qos.ch/manual/configuration.html
private val LogbackLocation = Option(System.getProperty("logback.configurationFile"))
// add some default logback.xml to your /src/main/resources
private lazy val defaultLogbackConf = getClass.getResource("/logback.xml").getPath
private def getLogbackConfigPath = {
val path = LogbackLocation.map(new File(_).getPath).getOrElse(defaultLogbackConf)
logger.info(s"Loading logging configuration from: $path")
path
}
然后当我初始化SparkContext时...
And then when I initialize my SparkContext...
val sc = SparkContext.getOrCreate(conf)
sc.addFile(getLogbackConfigPath)
我可以确认它在本地可用.
spark-submit \
...
--master yarn \
--class com.company.Main\
/path/to/my/application-fat.jar \
param1 param2
这给出了一个错误:
Exception in thread "main" java.io.FileNotFoundException: Added file file:/path/to/my/application-fat.jar!/logback.xml does not exist
我认为这是胡说八道,因为首先应用程序会根据我的代码找到文件
Which I think is nonsense, because first the application, finds the file (according to my code)
getClass.getResource("/logback.xml").getPath
然后,在
sc.addFile(getLogbackConfigPath)
原来是……哇!那里没有文件!?有没有搞错!?为什么无法在jar中找到文件.显然在那里,我做了 triple 检查.
it turns out... whoa! no file there!? What the heck!? Why would it not find the file inside the jar. It obviously is there, I did triple checked it.
所以我想,好的.我将传递文件,因为我可以指定system属性.我将logback.xml
文件放在我的application-fat.jar
旁边,并且:
So I thought, OK. I will pass my file, as I could specify the system property. I put the logback.xml
file next to my application-fat.jar
and:
spark-submit \
...
--conf spark.driver.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
--conf spark.executor.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
--master yarn \
--class com.company.Main\
/path/to/my/application-fat.jar \
param1 param2
我得到与上述相同的错误.所以我的设置被完全忽略了!为什么?如何指定
And I get the same error as above. So my setting is completely ignored! Why? How to specify
-Dlogback.configurationFile
正确地将其正确地传递给驱动程序和执行者?
properly and pass it as properly to driver and executors?
谢谢!
推荐答案
1.解决java.io.FileNotFoundException
这可能无法解决.
很简单,SparkContext.addFile
无法从Jar内读取文件.我相信它会像某些zip
一样被对待.
Simply, SparkContext.addFile
can not read the file from inside the Jar. I believe it is treated as it was in some zip
or alike.
好.
由于我对配置参数的误解,因此无法正常工作.
This was not working due to my misunderstanding of the configuration parameters.
因为我使用的是--master yarn
参数,但我没有将--deploy-mode
设置为cluster
,所以默认情况下是client
.
Because I am using --master yarn
parameter, but I do not specify --deploy-mode
to cluster
it is by default client
.
阅读 https://spark.apache.org/docs/1.6.1/configuration.html#application-properties
spark.driver.extraJavaOptions
注意:在客户端模式下,不得直接在应用程序中通过SparkConf设置此配置,因为此时驱动程序JVM已经启动.相反,请通过--driver-java-options命令行选项或在默认属性文件中进行设置.
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.
因此通过--driver-java-options
传递此设置有效:
So passing this setting with --driver-java-options
worked:
spark-submit \
...
--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
--master yarn \
--class com.company.Main\
/path/to/my/application-fat.jar \
param1 param2
关于--driver-java-options
的说明
与--conf
相比,必须将多个参数作为一个参数传递,例如:
Note about --driver-java-options
In contrast to --conf
multiple parameters have to be passed as one parameter, example:
--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml -Dother.setting=value" \
以下将不起作用
--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
--driver-java-options "-Dother.setting=value" \
这篇关于将系统属性传递给spark-submit并从类路径或自定义路径读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!