Spark:您可以在输出文件中包含分区列吗? [英] Spark: can you include partition columns in output files?

查看：131 发布时间：2020/6/17 19:21:44 apache-spark hadoop-partitioning

本文介绍了Spark:您可以在输出文件中包含分区列吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Spark将数据写出到分区中.给定一个具有两列(foo, bar)的数据集，如果我执行df.write.mode("overwrite").format("csv").partitionBy("foo").save("/tmp/output")，我将得到一个输出

I am using Spark to write out data into partitions. Given a dataset with two columns (foo, bar), if I do df.write.mode("overwrite").format("csv").partitionBy("foo").save("/tmp/output"), I get an output of

/tmp/output/foo=1/X.csv
/tmp/output/foo=2/Y.csv
...

但是，输出CSV文件仅包含bar的值，而不包含foo.我知道目录名称foo=N中已经捕获了foo的值，但是是否可以在CSV文件中包含foo的值?

However, the output CSV files only contain the value for bar, not foo. I know the value of foo is already captured in the directory name foo=N, but is it possible to also include the value of foo in the CSV file?

推荐答案

仅当您使用其他名称进行复制时:

Only if you make a copy under different name:

(df
    .withColumn("foo_", col("foo"))
    .write.mode("overwrite")
    .format("csv").partitionBy("foo_").save("/tmp/output"))

这篇关于Spark:您可以在输出文件中包含分区列吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark:您可以在输出文件中包含分区列吗? [英] Spark: can you include partition columns in output files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark:您可以在输出文件中包含分区列吗? [英] Spark: can you include partition columns in output files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭