将系统属性传递给spark-submit并从类路径或自定义路径读取文件 [英] Pass system property to spark-submit and read file from classpath or custom path

查看:268
本文介绍了将系统属性传递给spark-submit并从类路径或自定义路径读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在Apache Spark中找到了 一种使用logback代替log4j的方法 (都供本地使用和spark-submit).但是,最后一块丢失了.

I have recently found a way to use logback instead of log4j in Apache Spark (both for local use and spark-submit). However, there is last piece missing.

问题是Spark非常努力地在其类路径中看不到logback.xml设置.我已经找到了一种在本地执行期间加载它的方法:

The issue is that Spark tries very hard not to see logback.xml settings in its classpath. I have already found a way to load it during local execution:

基本上,检查系统属性logback.configurationFile,但是从我的/src/main/resources/中加载logback.xml是为了防万一:

Basically, checking for System property logback.configurationFile, but loading logback.xml from my /src/main/resources/ just in case:

// the same as default: https://logback.qos.ch/manual/configuration.html
private val LogbackLocation = Option(System.getProperty("logback.configurationFile"))
// add some default logback.xml to your /src/main/resources
private lazy val defaultLogbackConf = getClass.getResource("/logback.xml").getPath

private def getLogbackConfigPath = {
   val path = LogbackLocation.map(new File(_).getPath).getOrElse(defaultLogbackConf)
   logger.info(s"Loading logging configuration from: $path")
   path
}

然后当我初始化SparkContext时...

And then when I initialize my SparkContext...

val sc = SparkContext.getOrCreate(conf)
sc.addFile(getLogbackConfigPath)

我可以确认它在本地可用.

spark-submit \
  ...
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

这给出了一个错误:

Exception in thread "main" java.io.FileNotFoundException: Added file file:/path/to/my/application-fat.jar!/logback.xml does not exist

我认为这是胡说八道,因为首先应用程序会根据我的代码找到文件

Which I think is nonsense, because first the application, finds the file (according to my code)

getClass.getResource("/logback.xml").getPath

然后,在

sc.addFile(getLogbackConfigPath)

原来是……哇!那里没有文件!?有没有搞错!?为什么无法在jar中找到文件.显然在那里,我做了 triple 检查.

it turns out... whoa! no file there!? What the heck!? Why would it not find the file inside the jar. It obviously is there, I did triple checked it.

所以我想,好的.我将传递文件,因为我可以指定system属性.我将logback.xml文件放在我的application-fat.jar旁边,并且:

So I thought, OK. I will pass my file, as I could specify the system property. I put the logback.xml file next to my application-fat.jar and:

spark-submit \
  ...
  --conf spark.driver.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --conf spark.executor.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

我得到与上述相同的错误.所以我的设置被完全忽略了!为什么?如何指定

And I get the same error as above. So my setting is completely ignored! Why? How to specify

-Dlogback.configurationFile

正确地将其正确地传递给驱动程序和执行者?

properly and pass it as properly to driver and executors?

谢谢!

推荐答案

1.解决java.io.FileNotFoundException

可能无法解决.

很简单,SparkContext.addFile无法从Jar内读取文件.我相信它会像某些zip一样被对待.

Simply, SparkContext.addFile can not read the file from inside the Jar. I believe it is treated as it was in some zip or alike.

好.

由于我对配置参数的误解,因此无法正常工作.

This was not working due to my misunderstanding of the configuration parameters.

因为我使用的是--master yarn参数,但我没有将--deploy-mode设置为cluster,所以默认情况下是client.

Because I am using --master yarn parameter, but I do not specify --deploy-mode to cluster it is by default client.

阅读 https://spark.apache.org/docs/1.6.1/configuration.html#application-properties

spark.driver.extraJavaOptions

注意:在客户端模式下,不得直接在应用程序中通过SparkConf设置此配置,因为此时驱动程序JVM已经启动.相反,请通过--driver-java-options命令行选项或在默认属性文件中进行设置.

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.

因此通过--driver-java-options传递此设置有效:

So passing this setting with --driver-java-options worked:

spark-submit \
  ...
  --driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

关于--driver-java-options

的说明

--conf相比,必须将多个参数作为一个参数传递,例如:

Note about --driver-java-options

In contrast to --conf multiple parameters have to be passed as one parameter, example:

--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml -Dother.setting=value" \

以下将不起作用

--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
--driver-java-options "-Dother.setting=value" \

这篇关于将系统属性传递给spark-submit并从类路径或自定义路径读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆