如何将仅包含标头(无行)的数据集写入hdfs位置(csv格式),以便在下载时包含标头? [英] How do I write a dataset which contains only header (no rows) into a hdfs location (csv format) such that it contains the header when downloaded?
问题描述
我有一个仅包含标题(id,name,age)和0行的数据集. 我想使用
I have a dataset which contains only header (id,name,age) and 0 rows. I want to write it into an hdfs location as a csv file using
DataFrameWriter dataFrameWriter = dataset.write();
Map<String, String> csvOptions = new HashMap<>();
csvOptions.put("header", "true");
dataFrameWriter = dataFrameWriter.options(csvOptions);
dataFrameWriter.mode(SaveMode.Overwrite).csv(location);
在hdfs位置,文件为:
In the hdfs location , the files are:
1. _SUCCESS
2. tempFile.csv
如果我转到该位置并下载文件(tempFile.csv),则会得到一个空的csv文件. 尝试使用标头true和false两者. 如何将标题写为csv文件的内容?
If I go to that location and download the file (tempFile.csv) , I get an empty csv file. Have tried with header true and false both. How do I write the header as a content of the csv file?
推荐答案
这是一种解决方法.在Scala中,您可以执行以下操作:
Well this is a workaround. In Scala, you can do something like this:
df.take(1).isEmpty match {
case true => sc.parallelize(Array(df.schema.map(_.name).mkString(",")))
.saveAsTextFile("temp")
case false => df.write.save("temp")
}
df.schema
将数据帧df
的模式返回为StructType
.
df.schema
returns the schema of dataframe df
as StructType
.
_.name
返回架构中每一列的名称.
_.name
returns the name of each column in the schema.
mkString(",")
将名称的结果序列转换为逗号分隔的字符串
mkString(",")
converts the Resultant Sequence of names to a comma separated String
我想Java可以完成类似的事情.
Something similar can be done for Java, I guess.
这篇关于如何将仅包含标头(无行)的数据集写入hdfs位置(csv格式),以便在下载时包含标头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!