如何在Spark 1.2.0中将配置文件添加到所有Spark执行程序的类路径中? [英] How to add configuration file to classpath of all Spark executors in Spark 1.2.0?

查看:340
本文介绍了如何在Spark 1.2.0中将配置文件添加到所有Spark执行程序的类路径中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Typesafe Config( https://github.com/typesafehub/config )来参数化正在运行的Spark作业在纱线群集模式下使用配置文件. Typesafe Config的默认行为是在类路径中搜索名称与正则表达式匹配的资源,并使用ConfigFactory.load()自动将它们加载到配置类中(出于我们的目的,假定它查找的文件称为application.conf)./p>

我能够使用--driver-class-path <directory containing configuration file>将配置文件加载到驱动程序中,但是使用--conf spark.executor.extraClassPath=<directory containing configuration file>不会像应有的那样将资源放在所有执行程序的类路径中.执行者报告说,他们找不到我要添加到其类路径的配置文件中确实存在的某个键的某个配置设置.

使用Spark将文件添加到所有执行器JVM的类路径的正确方法是什么?

解决方案

看起来spark.executor.extraClassPath属性的值是相对于EXECUTOR上应用程序的工作目录的.

因此,要正确使用此属性,应先使用--files <configuration file>引导Spark将文件复制到所有执行程序的工作目录中,然后使用spark.executor.extraClassPath=./将执行程序的工作目录添加到其类路径中.这种组合使执行程序能够从配置文件中读取值.

I'm using Typesafe Config, https://github.com/typesafehub/config, to parameterize a Spark job running in yarn-cluster mode with a configuration file. The default behavior of Typesafe Config is to search the classpath for resources with names matching a regex and to load them into your configuration class automatically with ConfigFactory.load() (for our purposes, assume the file it looks for is called application.conf).

I am able to load the configuration file into the driver using --driver-class-path <directory containing configuration file>, but using --conf spark.executor.extraClassPath=<directory containing configuration file> does not put the resource on the classpath of all executors like it should. The executors report that they can not find a certain configuration setting for a key that does exist in the configuration file that I'm attempting to add to their classpaths.

What is the correct way to add a file to the classpaths of all executor JVMs using Spark?

解决方案

It looks like the value of the spark.executor.extraClassPath property is relative to the working directory of the application ON THE EXECUTOR.

So, to use this property correctly, one should use --files <configuration file> to first direct Spark to copy the file to the working directory of all executors, then use spark.executor.extraClassPath=./ to add the executor's working directory to its classpath. This combination results in the executor being able to read values from the configuration file.

这篇关于如何在Spark 1.2.0中将配置文件添加到所有Spark执行程序的类路径中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆