如何将HDFS中托管的配置文件传递给Spark Application? [英] How to pass configuration file that hosted in HDFS to Spark Application?
问题描述
我正在使用Spark结构化流.另外,我正在使用Scala
.我想将配置文件传递给我的Spark应用程序.此配置文件托管在HDFS
中.例如;
I'm working with Spark Structured Streaming. Also, I'm working with Scala
. I want to pass config file to my spark application. This configuration file hosted in HDFS
. For example;
spark_job.conf(HOCON)
spark {
appName: "",
master: "",
shuffle.size: 4
etc..
}
kafkaSource {
servers: "",
topic: "",
etc..
}
redisSink {
host: "",
port: 999,
timeout: 2000,
checkpointLocation: "hdfs location",
etc..
}
如何将其传递给Spark Application?如何在Spark中读取此文件(hosted HDFS
)?
How can I pass it to Spark Application? How can I read this file(hosted HDFS
) in Spark?
推荐答案
您可以通过以下方式从HDFS中读取HOCON配置:
You can read the HOCON config from HDFS in the following way:
import com.typesafe.config.{Config, ConfigFactory}
import java.io.InputStreamReader
import java.net.URI
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.conf.Configuration
val hdfs: FileSystem = FileSystem.get(new URI("hdfs://"), new Configuration())
val reader = new InputStreamReader(hdfs.open(new Path("/path/to/conf/on/hdfs")))
val conf: Config = ConfigFactory.parseReader(reader)
您还可以将名称节点的URI传递给FileSystem.get(new URI("your_uri_here"))
,该代码仍会读取您的配置.
You can also pass the URI of your namenode to the FileSystem.get(new URI("your_uri_here"))
and the code will still read your configuration.
这篇关于如何将HDFS中托管的配置文件传递给Spark Application?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!