将conf文件添加到Google Dataproc中的classpath [英] Add conf file to classpath in Google Dataproc
问题描述
我们正在使用HOCON配置在Scala中构建一个Spark应用程序,该配置称为application.conf
.
We're building a Spark application in Scala with a HOCON configuration, the config is called application.conf
.
如果将application.conf
添加到我的jar文件中并在Google Dataproc上启动作业,则它可以正常工作:
If I add the application.conf
to my jar file and start a job on Google Dataproc, it works correctly:
gcloud dataproc jobs submit spark \
--cluster <clustername> \
--jar=gs://<bucketname>/<filename>.jar \
--region=<myregion> \
-- \
<some options>
我不想将application.conf
与我的jar文件捆绑在一起,但要单独提供,我无法使用.
I don't want to bundle the application.conf
with my jar file but provide it separately, which I can't get working.
尝试了不同的事情,即
- 用
--jars=gs://<bucketname>/application.conf
指定application.conf(应该根据此答案工作) - 使用
--files=gs://<bucketname>/application.conf
- 与1. + 2.相同,在群集的Master实例上使用
/tmp/
中的应用程序conf,然后使用file:///tmp/application.conf
指定本地文件
- 使用
--properties=spark.driver.extraClassPath=gs://<bucketname>/application.conf
为执行器定义extraClassPath
(对于执行程序)
- Specifying the application.conf with
--jars=gs://<bucketname>/application.conf
(which should work according to this answer) - Using
--files=gs://<bucketname>/application.conf
- Same as 1. + 2. with the application conf in
/tmp/
on the Master instance of the cluster, then specifying the local file withfile:///tmp/application.conf
- Defining
extraClassPath
for spark using--properties=spark.driver.extraClassPath=gs://<bucketname>/application.conf
(and for executors)
使用所有这些选项时,我得到一个错误,它无法在配置中找到密钥:
With all these options I get an error, it can't find the key in the config:
Exception in thread "main" com.typesafe.config.ConfigException$Missing: system properties: No configuration setting found for key 'xyz'
此错误通常意味着HOCON配置中存在错误(键xyz
在HOCON中未定义)或application.conf
不在类路径中.由于在我的jar文件中使用完全相同的配置时,我认为是后者.
This error usually means that there's an error in the HOCON config (key xyz
is not defined in HOCON) or that the application.conf
is not in the classpath. Since the exact same config is working when inside my jar file, I assume it's the latter.
还有其他选项可以将application.conf
放在类路径上吗?
Are there any other options to put the application.conf
on the classpath?
推荐答案
如果--jars
不能按照启动操作.首先将您的配置上传到GCS,然后编写一个初始化操作以将其下载到VM,将其放置到类路径中的文件夹中,或者更新spark-env.sh以包含配置的路径.
If --jars
doesn't work as suggested in this answer, you can try init action. First upload your config to GCS, then write an init action to download it to the VMs, putting it to a folder in the classpath or update spark-env.sh to include the path to the config.
这篇关于将conf文件添加到Google Dataproc中的classpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!