如何获取上传文件的路径 [英] How to get path to the uploaded file
问题描述
我在谷歌云上运行了一个 Spark 集群,我为每个作业上传了一个配置文件.使用提交命令上传的文件的路径是什么?
I am running an spark cluster on google cloud and I upload a configuration file with each job. What is the path to a file that is uploaded with a submit command?
在下面的示例中,如何在 SparkContext 初始化之前读取文件 Configuration.properties
?我正在使用 Scala.
In the example below how can I read the file Configuration.properties
before the SparkContext has been initialized? I am using Scala.
gcloud dataproc jobs submit spark --cluster my-cluster --class MyJob --files config/Configuration.properties --jars my.jar
推荐答案
使用 SparkFiles
机制分发的文件的本地路径(--files
参数,SparkContext.addFile
) 方法可以使用 SparkFiles.get
获得:
Local path to a file distributed using SparkFiles
mechanism (--files
argument, SparkContext.addFile
) method can be obtained using SparkFiles.get
:
org.apache.spark.SparkFiles.get(fileName)
您还可以使用 SparkFiles.getRootDirectory
获取根目录的路径:
You can also get the path to the root directory using SparkFiles.getRootDirectory
:
org.apache.spark.SparkFiles.getRootDirectory
您可以将这些与标准 IO 实用程序结合使用来读取文件.
You can use these combined with standard IO utilities to read the files.
如何在 SparkContext 初始化之前读取文件 Configuration.properties?
how can I read the file Configuration.properties before the SparkContext has been initialized?
SparkFiles
由驱动程序分发,在上下文初始化之前无法访问,并且首先要分发,必须可以从驱动程序节点访问.因此,这部分问题完全取决于您将使用哪种类型的存储将文件公开给驱动程序节点.
SparkFiles
are distributed by the driver, cannot be accessed before context has been initialized, and to be distributed in the first place, have to be accessible from the driver node. So this part of the question solely depends what type of storage you'll use to expose the file to the driver node.
这篇关于如何获取上传文件的路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!