Spark结构化流-从嵌套目录读取文件 [英] Spark Structured Streaming - Read file from Nested Directories

查看：77 发布时间：2021/4/8 19:42:49 apache-spark spark-streaming

本文介绍了Spark结构化流-从嵌套目录读取文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个将CSV文件放置在嵌套目录中的客户端，如下所示，我需要实时读取这些文件.我正在尝试使用Spark结构化流媒体来做到这一点.

I have a client which places the CSV files in Nested Directories as below, I need to read these files in real-time. I am trying to do this using Spark Structured Streaming.

Data:
/user/data/1.csv
/user/data/2.csv
/user/data/3.csv
/user/data/sub1/1_1.csv
/user/data/sub1/1_2.csv
/user/data/sub1/sub2/2_1.csv
/user/data/sub1/sub2/2_2.csv

代码:

val csvDF = spark
  .readStream
  .option("sep", ",")
  .schema(userSchema)      // Schema of the csv files
  .csv("/user/data/")

要添加任何配置以允许从结构化流中的嵌套目录中读取火花.

Any configurations to be added to allow spark reading from nested directories in Structured Streaming.

推荐答案

我能够使用全局路径在子目录中流式传输文件.

I am able to stream the files in sub-directories using glob path.

为他人着想而张贴在这里.

Posting here for the sake of others.

inputPath = "/spark_structured_input/*?*"
inputDF = spark.readStream.option("header", "true").schema(userSchema).csv(inputPath)
query = inputDF.writeStream.format("console").start()

这篇关于Spark结构化流-从嵌套目录读取文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark结构化流-从嵌套目录读取文件 [英] Spark Structured Streaming - Read file from Nested Directories

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark结构化流-从嵌套目录读取文件 [英] Spark Structured Streaming - Read file from Nested Directories

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭