如何获取上载文件的路径 [英] How to get path to the uploaded file

查看:85
本文介绍了如何获取上载文件的路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Google Cloud上运行一个Spark集群,并且每个作业都上载了一个配置文件.使用提交命令上传的文件的路径是什么?

I am running an spark cluster on google cloud and I upload a configuration file with each job. What is the path to a file that is uploaded with a submit command?

在下面的示例中,如何在初始化SparkContext之前读取文件Configuration.properties?我正在使用Scala.

In the example below how can I read the file Configuration.properties before the SparkContext has been initialized? I am using Scala.

 gcloud dataproc jobs submit spark --cluster my-cluster --class MyJob  --files  config/Configuration.properties --jars my.jar  

推荐答案

使用SparkFiles.get机制可以获取使用SparkFiles机制(--files参数,SparkContext.addFile)方法分发的文件的本地路径:

Local path to a file distributed using SparkFiles mechanism (--files argument, SparkContext.addFile) method can be obtained using SparkFiles.get:

org.apache.spark.SparkFiles.get(fileName)

您还可以使用SparkFiles.getRootDirectory获取到根目录的路径:

You can also get the path to the root directory using SparkFiles.getRootDirectory:

org.apache.spark.SparkFiles.getRootDirectory

您可以将它们与标准IO实用程序结合使用来读取文件.

You can use these combined with standard IO utilities to read the files.

如何在SparkContext初始化之前读取文件Configuration.properties?

how can I read the file Configuration.properties before the SparkContext has been initialized?

SparkFiles由驱动程序分发,在上下文初始化之前无法访问,并且首先要分发,必须从驱动程序节点进行访问.因此,问题的这一部分仅取决于您将使用哪种存储类型将文件公开给驱动程序节点.

SparkFiles are distributed by the driver, cannot be accessed before context has been initialized, and to be distributed in the first place, have to be accessible from the driver node. So this part of the question solely depends what type of storage you'll use to expose the file to the driver node.

这篇关于如何获取上载文件的路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆