Spark-如何在没有文件夹的情况下编写单个csv文件? [英] Spark - How to write a single csv file WITHOUT folder?
问题描述
假设df
是Spark中的数据帧.将df
写入单个CSV文件的方法是
Suppose that df
is a dataframe in Spark. The way to write df
into a single CSV file is
df.coalesce(1).write.option("header", "true").csv("name.csv")
这会将数据帧写入包含在名为name.csv
的文件夹中的CSV文件中,但实际的CSV文件将称为类似part-00000-af091215-57c0-45c4-a521-cd7d9afb5e54.csv
的名称.
This will write the dataframe into a CSV file contained in a folder called name.csv
but the actual CSV file will be called something like part-00000-af091215-57c0-45c4-a521-cd7d9afb5e54.csv
.
我想知道是否可以避免使用文件夹name.csv
并具有名为name.csv
而不是part-00000-af091215-57c0-45c4-a521-cd7d9afb5e54.csv
的实际CSV文件.原因是我需要编写几个CSV文件,以后将在Python中一起阅读,但是我的Python代码利用了实际的CSV名称,并且还需要将所有单个CSV文件都放在一个文件夹中(而不是一个文件夹中).文件夹).
I would like to know if it is possible to avoid the folder name.csv
and to have the actual CSV file called name.csv
and not part-00000-af091215-57c0-45c4-a521-cd7d9afb5e54.csv
. The reason is that I need to write several CSV files which later on I will read together in Python, but my Python code makes use of the actual CSV names and also needs to have all the single CSV files in a folder (and not a folder of folders).
感谢您的帮助.
推荐答案
一种可能的解决方案是将Spark数据帧转换为pandas数据帧并将其另存为csv:
A possible solution could be convert the Spark dataframe to a pandas dataframe and save it as csv:
df.toPandas().to_csv("<path>/<filename>")
这篇关于Spark-如何在没有文件夹的情况下编写单个csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!