如何获取上传文件的路径 [英] How to get path to the uploaded file

查看:138
本文介绍了如何获取上传文件的路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在谷歌云上运行了一个 Spark 集群,我为每个作业上传了一个配置文件.使用提交命令上传的文件的路径是什么?

I am running an spark cluster on google cloud and I upload a configuration file with each job. What is the path to a file that is uploaded with a submit command?

在下面的示例中,如何在 SparkContext 初始化之前读取文件 Configuration.properties?我正在使用 Scala.

In the example below how can I read the file Configuration.properties before the SparkContext has been initialized? I am using Scala.

 gcloud dataproc jobs submit spark --cluster my-cluster --class MyJob  --files  config/Configuration.properties --jars my.jar  

推荐答案

使用 SparkFiles 机制分发的文件的本地路径(--files 参数,SparkContext.addFile) 方法可以使用 SparkFiles.get 获得:

Local path to a file distributed using SparkFiles mechanism (--files argument, SparkContext.addFile) method can be obtained using SparkFiles.get:

org.apache.spark.SparkFiles.get(fileName)

您还可以使用 SparkFiles.getRootDirectory 获取根目录的路径:

You can also get the path to the root directory using SparkFiles.getRootDirectory:

org.apache.spark.SparkFiles.getRootDirectory

您可以将这些与标准 IO 实用程序结合使用来读取文件.

You can use these combined with standard IO utilities to read the files.

如何在 SparkContext 初始化之前读取文件 Configuration.properties?

how can I read the file Configuration.properties before the SparkContext has been initialized?

SparkFiles 由驱动程序分发,在上下文初始化之前无法访问,并且首先要分发,必须可以从驱动程序节点访问.因此,这部分问题完全取决于您将使用哪种类型的存储将文件公开给驱动程序节点.

SparkFiles are distributed by the driver, cannot be accessed before context has been initialized, and to be distributed in the first place, have to be accessible from the driver node. So this part of the question solely depends what type of storage you'll use to expose the file to the driver node.

这篇关于如何获取上传文件的路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆