斯卡拉& DataBricks:获取文件列表 [英] Scala & DataBricks: Getting a list of Files

查看:120
本文介绍了斯卡拉& DataBricks:获取文件列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Scala的Databricks上的S3存储桶中列出文件列表,然后按正则表达式拆分.我是Scala的新手.相当于python的

I am trying to make a list of files in an S3 bucket on Databricks within Scala, and then split by regex. I am very new to Scala. The python equivalent would be

all_files = map(lambda x: x.path, dbutils.fs.ls(folder))
filtered_files = filter(lambda name: True if pattern.match(name) else False, all_files)

但是我想在Scala中做到这一点.

but I want to do this in Scala.

来自 https://alvinalexander. com/scala/how-to-to-list-files-in-directory-filter-names-scala

import java.io.File
def getListOfFiles(dir: String):List[File] = {
    val d = new File(dir)
    if (d.exists && d.isDirectory) {
        d.listFiles.filter(_.isFile).toList
    } else {
        List[File]()
    }
}

但是,这会产生一个空列表.

However, this produces an empty list.

我也想到了

var all_files: List[Any] = List(dbutils.fs.ls("s3://bucket"))

但这会生成类似列表(长度为1)

but this produces a list of things like (with length 1)

all_files: List[Any] = List(WrappedArray(FileInfo(s3://bucket/.internal_name.pl.swp, .internal_name.pl.swp, 12288), FileInfo(s3://bucket/file0, 10223616), FileInfo(s3://bucket/, file1, 0), ....)

,其长度为1.我无法将其转换为数据帧,如如何迭代scalawrappedArray? (火花)这不可用.

which has a length of 1. I cannot turn this into a dataframe, as suggested by How to iterate scala wrappedArray? (Spark) This isn't usable.

如何在Scala中生成文件列表,然后遍历它们?

How can I generate a list of files in Scala, and then iterate through them?

推荐答案

您应该这样做:

val name : String = ???   
val all_files : Seq[String] = dbutils.fs.ls("s3://bucket").map(_.path).filter(_.matches(name))

这篇关于斯卡拉& DataBricks:获取文件列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆