将conf文件添加到Google Dataproc中的classpath [英] Add conf file to classpath in Google Dataproc

查看:116
本文介绍了将conf文件添加到Google Dataproc中的classpath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用HOCON配置在Scala中构建一个Spark应用程序,该配置称为application.conf.

We're building a Spark application in Scala with a HOCON configuration, the config is called application.conf.

如果将application.conf添加到我的jar文件中并在Google Dataproc上启动作业,则它可以正常工作:

If I add the application.conf to my jar file and start a job on Google Dataproc, it works correctly:

gcloud dataproc jobs submit spark \
  --cluster <clustername> \
  --jar=gs://<bucketname>/<filename>.jar \
  --region=<myregion> \
  -- \
  <some options>

我不想将application.conf与我的jar文件捆绑在一起,但要单独提供,我无法使用.

I don't want to bundle the application.conf with my jar file but provide it separately, which I can't get working.

尝试了不同的事情,即

  1. --jars=gs://<bucketname>/application.conf指定application.conf(应该根据此答案工作)
  2. 使用--files=gs://<bucketname>/application.conf
  3. 与1. + 2.相同,在群集的Master实例上使用/tmp/中的应用程序conf,然后使用file:///tmp/application.conf
  4. 指定本地文件
  5. 使用--properties=spark.driver.extraClassPath=gs://<bucketname>/application.conf为执行器定义extraClassPath(对于执行程序)
  1. Specifying the application.conf with --jars=gs://<bucketname>/application.conf (which should work according to this answer)
  2. Using --files=gs://<bucketname>/application.conf
  3. Same as 1. + 2. with the application conf in /tmp/ on the Master instance of the cluster, then specifying the local file with file:///tmp/application.conf
  4. Defining extraClassPath for spark using --properties=spark.driver.extraClassPath=gs://<bucketname>/application.conf (and for executors)

使用所有这些选项时,我得到一个错误,它无法在配置中找到密钥:

With all these options I get an error, it can't find the key in the config:

Exception in thread "main" com.typesafe.config.ConfigException$Missing: system properties: No configuration setting found for key 'xyz'

此错误通常意味着HOCON配置中存在错误(键xyz在HOCON中未定义)或application.conf不在类路径中.由于在我的jar文件中使用完全相同的配置时,我认为是后者.

This error usually means that there's an error in the HOCON config (key xyz is not defined in HOCON) or that the application.conf is not in the classpath. Since the exact same config is working when inside my jar file, I assume it's the latter.

还有其他选项可以将application.conf放在类路径上吗?

Are there any other options to put the application.conf on the classpath?

推荐答案

如果--jars不能按照

If --jars doesn't work as suggested in this answer, you can try init action. First upload your config to GCS, then write an init action to download it to the VMs, putting it to a folder in the classpath or update spark-env.sh to include the path to the config.

这篇关于将conf文件添加到Google Dataproc中的classpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆