如何将仅包含标头(无行)的数据集写入hdfs位置(csv格式),以便在下载时包含标头? [英] How do I write a dataset which contains only header (no rows) into a hdfs location (csv format) such that it contains the header when downloaded?

查看:143
本文介绍了如何将仅包含标头(无行)的数据集写入hdfs位置(csv格式),以便在下载时包含标头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个仅包含标题(id,name,age)和0行的数据集. 我想使用

I have a dataset which contains only header (id,name,age) and 0 rows. I want to write it into an hdfs location as a csv file using

DataFrameWriter dataFrameWriter = dataset.write();
Map<String, String> csvOptions = new HashMap<>();
csvOptions.put("header", "true");
dataFrameWriter = dataFrameWriter.options(csvOptions);
dataFrameWriter.mode(SaveMode.Overwrite).csv(location);

在hdfs位置,文件为:

In the hdfs location , the files are:

1. _SUCCESS
2. tempFile.csv

如果我转到该位置并下载文件(tempFile.csv),则会得到一个空的csv文件. 尝试使用标头true和false两者. 如何将标题写为csv文件的内容?

If I go to that location and download the file (tempFile.csv) , I get an empty csv file. Have tried with header true and false both. How do I write the header as a content of the csv file?

推荐答案

这是一种解决方法.在Scala中,您可以执行以下操作:

Well this is a workaround. In Scala, you can do something like this:

df.take(1).isEmpty match {

    case true => sc.parallelize(Array(df.schema.map(_.name).mkString(",")))
                .saveAsTextFile("temp")
    case false => df.write.save("temp")

}

df.schema将数据帧df的模式返回为StructType.

df.schema returns the schema of dataframe df as StructType.

_.name返回架构中每一列的名称.

_.name returns the name of each column in the schema.

mkString(",")将名称的结果序列转换为逗号分隔的字符串

mkString(",") converts the Resultant Sequence of names to a comma separated String

我想Java可以完成类似的事情.

Something similar can be done for Java, I guess.

这篇关于如何将仅包含标头(无行)的数据集写入hdfs位置(csv格式),以便在下载时包含标头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆