Spark Scala 列出目录中的文件夹 [英] Spark Scala list folders in directory

查看:60
本文介绍了Spark Scala 列出目录中的文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Scala/Spark 列出 hdfs 目录中的所有文件夹.在 Hadoop 中,我可以使用以下命令执行此操作:hadoop fs -ls hdfs://sandbox.hortonworks.com/demo/

I want to list all folders within a hdfs directory using Scala/Spark. In Hadoop I can do this by using the command: hadoop fs -ls hdfs://sandbox.hortonworks.com/demo/

我尝试过:

val conf = new Configuration()
val fs = FileSystem.get(new URI("hdfs://sandbox.hortonworks.com/"), conf)

val path = new Path("hdfs://sandbox.hortonworks.com/demo/")

val files = fs.listFiles(path, false)

但他似乎没有在 Hadoop 目录中查找,因为我找不到我的文件夹/文件.

But it does not seem that he looks in the Hadoop directory as i cannot find my folders/files.

我也试过:

FileSystem.get(sc.hadoopConfiguration).listFiles(new Path("hdfs://sandbox.hortonworks.com/demo/"), true)

但这也无济于事.

你有其他想法吗?

PS:我也检查了这个线程:Spark iterate HDFS directory 但它不起作用对我来说,它似乎没有在 hdfs 目录上搜索,而是仅在具有架构文件的本地文件系统上搜索//.

PS: I also checked this thread: Spark iterate HDFS directory but it does not work for me as it does not seem to search on hdfs directory, instead only on the local file system with schema file//.

推荐答案

我们使用的是 hadoop 1.4,它没有 listFiles 方法,所以我们使用 listStatus 来获取目录.它没有递归选项,但很容易管理递归查找.

We are using hadoop 1.4 and it doesn't have listFiles method so we use listStatus to get directories. It doesn't have recursive option but it is easy to manage recursive lookup.

val fs = FileSystem.get(new Configuration())
val status = fs.listStatus(new Path(YOUR_HDFS_PATH))
status.foreach(x=> println(x.getPath))

这篇关于Spark Scala 列出目录中的文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆