apache spark - 检查文件是否存在 [英] apache spark - check if file exists

查看:45
本文介绍了apache spark - 检查文件是否存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Spark 新手,我有一个问题.我有一个两步过程,其中第一步将 SUCCESS.txt 文件写入 HDFS 上的某个位置.我的第二步是 Spark 作业,它必须在开始处理数据之前验证 SUCCESS.txt 文件是否存在.

I am new to spark and I have a question. I have a two step process in which the first step write a SUCCESS.txt file to a location on HDFS. My second step which is a spark job has to verify if that SUCCESS.txt file exists before it starts processing the data.

我检查了 spark API 并没有找到任何检查文件是否存在的方法.任何想法如何处理这个?

I checked the spark API and didnt find any method which checks if a file exists. Any ideas how to handle this?

我发现的唯一方法是 sc.textFile(hdfs:///SUCCESS.txt).count() 当文件不存在时它会抛出异常.我必须捕获该异常并相应地编写我的程序.我真的不喜欢这种方法.希望找到更好的选择.

The only method I found was sc.textFile(hdfs:///SUCCESS.txt).count() which would throw an exception when the file does not exist. I have to catch that exception and write my program accordingly. I didnt really like this approach. Hoping to find a better alternative.

推荐答案

对于 HDFS 中的文件,您可以使用 hadoop 这样做的方法:

For a file in HDFS, you can use the hadoop way of doing this:

val conf = sc.hadoopConfiguration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)
val exists = fs.exists(new org.apache.hadoop.fs.Path("/path/on/hdfs/to/SUCCESS.txt"))

这篇关于apache spark - 检查文件是否存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆