在_spark_metadata中什么都没有找到 [英] Found nothing in _spark_metadata
问题描述
我正在尝试从特定文件夹中读取CSV文件,并将相同内容写入本地PC上其他位置的其他CSV文件中,以进行学习.我可以读取文件并在控制台上显示内容.但是,如果要将其写入指定输出目录中的另一个CSV文件中,则会得到一个名为"_spark_metadata"的文件夹,其中没有任何内容.
I am trying to read CSV files from a specific folder and write same contents to other CSV file in a different location on the local pc for learning purpose. I can read the file and show the contents on the console. However, if I want to write it to another CSV file at the specified output directory I get a folder named "_spark_metadata" which contain nothing inside.
我将整个代码逐步粘贴到这里:
I paste the whole code here step by step:
spark = SparkSession \
.builder \
.appName('csv01') \
.master('local[*]') \
.getOrCreate();
spark.conf.set("spark.sql.streaming.checkpointLocation", <String path to checkpoint location directory> )
userSchema = StructType().add("name", "string").add("age", "integer")
从CSV文件读取
df = spark \
.readStream \
.schema(userSchema) \
.option("sep",",") \
.csv(<String path to local input directory containing CSV file>)
写入CSV文件
df.writeStream \
.format("csv") \
.option("path", <String path to local output directory containing CSV file>) \
.start()
在包含CSV文件的本地输出目录的字符串路径"中,我仅得到一个文件夹_spark_metadata,其中不包含CSV文件.
In "String path to local output directory containing CSV file" I only get a folder _spark_metadata which contains no CSV file.
对此的任何帮助都将受到赞赏
Any help on this is highly appreciated
推荐答案
您不使用readStream读取静态数据.您可以使用它从目录中读取文件,在该目录中将文件添加到该文件夹中.
You don't use readStream to read from static data. You use that to read from a directory where files are added into that folder.
您只需要 spark.read.csv
这篇关于在_spark_metadata中什么都没有找到的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!