apache spark - 检查文件是否存在 [英] apache spark - check if file exists

查看:1862
本文介绍了apache spark - 检查文件是否存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来的火花,我有一个问题。我有两个步骤,第一步将SUCCESS.txt文件写入HDFS位置。我的第二步是一个Spark工作,必须在开始处理数据之前验证SUCCESS.txt文件是否存在。

I am new to spark and I have a question. I have a two step process in which the first step write a SUCCESS.txt file to a location on HDFS. My second step which is a spark job has to verify if that SUCCESS.txt file exists before it starts processing the data.

我检查了spark API并没有发现任何方法它检查文件是否存在。任何想法如何处理?

I checked the spark API and didnt find any method which checks if a file exists. Any ideas how to handle this?

我发现的唯一方法是sc.textFile(hdfs:///SUCCESS.txt).count(),它会抛出异常当文件不存在时。我必须抓住这个例外并相应地编写我的程序。我不太喜欢这种方法。希望找到一个更好的选择。

The only method I found was sc.textFile(hdfs:///SUCCESS.txt).count() which would throw an exception when the file does not exist. I have to catch that exception and write my program accordingly. I didnt really like this approach. Hoping to find a better alternative.

推荐答案

对于HDFS中的文件,您可以使用hadoop 这样做的方式:

For a file in HDFS, you can use the hadoop way of doing this:

val conf = sc.hadoopConfiguration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)
val exists = fs.exists(new org.apache.hadoop.fs.Path("/path/on/hdfs/to/SUCCESS.txt"))

这篇关于apache spark - 检查文件是否存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆