在纱线上将Typesafe配置与Spark配合使用 [英] Using typesafe config with Spark on Yarn

查看:218
本文介绍了在纱线上将Typesafe配置与Spark配合使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Spark作业,可以从配置文件中读取数据.该文件是类型安全的配置文件.

I have a Spark job that reads data from a configuration file. This file is a typesafe config file.

读取配置的代码如下:

ConfigFactory.load().getConfig("com.mycompany")

现在,由于我想将文件作为外部文件传递,因此我不将application.conf组装为我的uber jar的一部分.

Now I don't assemble the application.conf as part of my uber jar since I want to pass the file as an external file

我要使用的外部application.conf的内容如下:

The content of the external application.conf I want to use looks like this:

com.mycompany {
  //configurations my program needs
}

此application.conf文件存在于我的本地计算机文件系统上(而不是HDFS上)

This application.conf file exists on my local machine file system (and not on HDFS)

我在纱线上使用Spark 1.6.1

I'm using Spark 1.6.1 with Yarn

这是我的spark-submit命令的样子:

This is how my spark-submit command looks like:

LOG4J_FULL_PATH=/log4j-path
ROOT_DIR=/application.conf-path

    /opt/deploy/spark/bin/spark-submit \
    --class com.mycompany.Main \
    --master yarn \
    --deploy-mode cluster \
    --files $ROOT_DIR/application.conf \
    --files $LOG4J_FULL_PATH/log4j.xml \
    --conf spark.executor.extraClassPath="-Dconfig.file=file:application.conf" \
    --driver-class-path $ROOT_DIR/application.conf \
    --verbose \
    /opt/deploy/lal-ml.jar

我收到的异常是:

2016-11-09 12:32:14 ERROR ApplicationMaster:95 - User class threw exception: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'com'
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'com'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
    at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:218)
    at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:224)
    at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:33)
    at com.mycompany.Main$.main(Main.scala:36)
    at com.mycompany.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)

所以我的问题是:有人知道如何加载带有spark-submit和yarn的本地类型安全的application.conf文件吗?

And so my question is: does anybody know how I can load an external typesafe application.conf file that sit on my local machine with spark-submit and yarn?

我尝试遵循 Spark中的类型安全配置以及

I tried following some of the solutions in How to add a typesafe config file which is located on HDFS to spark-submit (cluster-mode)? and in Typesafe Config in Spark and also in How to pass -D parameter or environment variable to Spark job? and nothing worked

我将感激解决这个问题的任何方向

I'll appreciate any direction to solving this

预先感谢

推荐答案

因此,在对Spark 1.6.1源代码进行了一些挖掘之后,我找到了解决方案.

So with a little digging in the Spark 1.6.1 source code I found the solution.

以下是您需要采取的步骤,以便在使用集群模式提交到yarn时获取应用程序正在使用的log4j和application.conf:

These are the steps that you need to take in order to get both the log4j and the application.conf being used by your application when submitting to yarn using cluster mode:

  • 像我在传递几个文件一样传递application.conf和log4j.xml文件时,您只需要使用以下一行提交它们:--files "$ROOT_DIR/application.conf,$LOG4J_FULL_PATH/log4j.xml"(用逗号分隔)
  • 对于application.conf就是这样.不需要application.conf的extraJavaOpts(正如我的问题所写).问题在于,Spark仅使用传递的最后一个--files参数,这就是传递log4j的原因.为了使用log4j.xml,我还必须采取以下步骤
  • 在spark提交中添加另一行,如下所示:--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:log4j.xml"-注意,一旦使用--files传递它,您就可以引用文件名而无需任何路径
  • When passing several files like I was doing passing both the application.conf and log4j.xml file you need to submit them using just one line like this: --files "$ROOT_DIR/application.conf,$LOG4J_FULL_PATH/log4j.xml" (separate them by comma)
  • Thats it for the application.conf. There's no need for the extraJavaOpts for the application.conf (as was written in my question). The issue is that Spark was using only the last --files argument that was passed and thats why log4j was being passed. In order to use log4j.xml I also had to take the following step
  • Add another line to the spark submit like this: --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:log4j.xml" - notice that once you pass it with --files you can just refer to the file name without any path

注意:我还没有尝试过,但是从我看到的如果您尝试在客户端模式下运行它的角度来看,我认为spark.driver.extraJavaOptions行应该重命名为driver-java-options之类的东西. 而已.如此简单,我希望这些东西能被更好地记录下来.我希望这个答案对某人有帮助

Note: I haven't tried it but from what I saw if you're trying to run it in client mode I think the spark.driver.extraJavaOptions line should be renamed to something like driver-java-options Thats it. So simple and I wish these things were documented better. I hope this answer will help someone

欢呼

这篇关于在纱线上将Typesafe配置与Spark配合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆