如何在没有** dbutils的情况下在Databricks dbfs中列出文件密钥 [英] How to list file keys in Databricks dbfs without dbutils

查看：50 发布时间：2021/4/8 20:14:52 apache-spark hadoop filesystems azure-databricks

本文介绍了如何在没有** dbutils的情况下在Databricks dbfs中列出文件密钥的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

显然 dbutils不能用于cmd-line spark-submit中，您必须为此使用Jar Jobs ，但是由于其他要求，我必须使用spark-submit样式的作业，但仍然需要列出和迭代dbfs中的文件键，以便就将哪些文件用作进程的输入做出一些决定...

Apparently dbutils cannot be used in cmd-line spark-submits, you must use Jar Jobs for that, but I MUST use spark-submit style jobs due to other requirements, yet still have a need to list and iterate over file keys in dbfs to make some decisions about which files to use as input to a process...

使用scala，我可以使用spark或hadoop中的哪个lib检索特定模式的 dbfs:/filekeys 列表?

Using scala, what lib in spark or hadoop can I use to retrieve a list of dbfs:/filekeys of a particular pattern?

import org.apache.hadoop.fs.Path
import org.apache.spark.sql.SparkSession

def ls(sparkSession: SparkSession, inputDir: String): Seq[String] = {
  println(s"FileUtils.ls path: $inputDir")
  val path = new Path(inputDir)
  val fs = path.getFileSystem(sparkSession.sparkContext.hadoopConfiguration)
  val fileStatuses = fs.listStatus(path)
  fileStatuses.filter(_.isFile).map(_.getPath).map(_.getName).toSeq
}

使用上述代码，如果我输入了部分密钥前缀，例如 dbfs:/mnt/path/to/folder ，而以下密钥存在于所述文件夹"中:

Using the above, if I pass in a partial key prefix like dbfs:/mnt/path/to/folder while the following keys are present in said "folder":

/mnt/path/to/folder/file1.csv
/mnt/path/to/folder/file2.csv

我得到 dbfs:/mnt/path/to/folder不是目录 val path = new Path(inputDir)

推荐答案

需要使用SparkSession来完成.

Need to use the SparkSession to do it.

这是我们的操作方式:

import org.apache.commons.io.IOUtils
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.SparkSession

def getFileSystem(sparkSession: SparkSession): FileSystem =
    FileSystem.get(sparkSession.sparkContext.hadoopConfiguration)

def listContents(sparkSession: SparkSession, dir: String): Seq[String] = {
  getFileSystem(sparkSession).listStatus(new path(dir)).toSeq.map(_.getPath).map(_.getName)
}

这篇关于如何在没有** dbutils的情况下在Databricks dbfs中列出文件密钥的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在没有** dbutils的情况下在Databricks dbfs中列出文件密钥 [英] How to list file keys in Databricks dbfs without dbutils

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在没有** dbutils的情况下在Databricks dbfs中列出文件密钥 [英] How to list file keys in Databricks dbfs **without** dbutils

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何在没有** dbutils的情况下在Databricks dbfs中列出文件密钥 [英] How to list file keys in Databricks dbfs without dbutils

登录关闭