Spark:目录中的附加属性 [英] Spark: additional properties in a directory

查看:25
本文介绍了Spark:目录中的附加属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用亚马逊 EMR 的 spark 1.5.0.我有多个属性文件需要在我的 spark-submit 程序中使用.我探索了 --properties-file 选项.但它允许您从单个文件导入属性.我需要从结构如下所示的目录中读取属性:

I am working with spark 1.5.0 an amazon's EMR. I have multiple properties file that I need to use in my spark-submit program. I explored the --properties-file option. But it allows you to import properties from a single file. I need to read properties from a directory whose structure looks like :

├── AddToCollection
│   ├── query
│   ├── root
│   ├── schema
│   └── schema.json
├── CreateCollectionSuccess
│   ├── query
│   ├── root
│   ├── schema
│   └── schema.json
├── FeedCardUnlike
│   ├── query
│   ├── root
│   ├── schema
│   └── schema.json

在独立模式下,我可以通过指定本地系统中文件的位置来解决这个问题.但它在集群模式下不起作用,我将 jar 与 spark-submit 命令一起使用.我怎样才能在火花中做到这一点?

In standalone mode I can get away with this by specifying the location of the files in the local system. But it doesn't work in cluster mode where I'm using a jar with the spark-submit command. How can I do this in spark?

推荐答案

这适用于 Spark 1.6.1(我没有测试过早期版本)

This works on Spark 1.6.1 (I haven't tested earlier versions)

spark-submit 支持 --files 参数,该参数接受要与 JAR 文件一起提交给驱动程序的本地"文件的逗号分隔列表.

spark-submit supports the --files argument that accepts a comma separated list of "local" files to be submitted along with your JAR file to the driver.

spark-submit \
    --class com.acme.Main \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 2g \
    --executor-memory 1g \
    --driver-class-path "./conf" \
    --files "./conf/app.properties,./conf/log4j.properties" \
    ./lib/my-app-uber.jar \
    "$@"

在这个例子中,我创建了一个 Uber JAR,它不包含任何属性文件.当我部署我的应用程序时,app.properties 和 log4j.properties 文件被放置在本地 ./conf 目录中.

In this example I have created an Uber JAR that does not contain any properties files. When I deploy my application the app.properties and log4j.properties files are placed into the local ./conf directory.

来自 SparkSubmitArguments 声明

--files 文件
逗号分隔的文件列表,放在每个执行器的工作目录中.

--files FILES
Comma-separated list of files to be placed in the working directory of each executor.

这篇关于Spark:目录中的附加属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆