Spark CSV 2.1文件名 [英] Spark CSV 2.1 File Names

查看：120 发布时间：2021/4/8 19:36:12 apache-spark spark-dataframe spark-csv

本文介绍了Spark CSV 2.1文件名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用新的spark 2.1 csv选项将DataFrame保存到CSV

i'm trying to save DataFrame into CSV using the new spark 2.1 csv option

 df.select(myColumns: _*).write
                  .mode(SaveMode.Overwrite)
                  .option("header", "true")
                  .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
                  .csv(absolutePath)

一切正常，我不介意使用000XX前缀但现在好像有些UUID被添加为后缀

everything works fine and i don't mind haivng the part-000XX prefix but now seems like some UUID was added as a suffix

i.e
part-00032-10309cf5-a373-4233-8b28-9e10ed279d2b.csv.gz ==> part-00032.csv.gz

任何人都知道如何删除此文件ext并仅保留part-000XX惯例

Anyone knows how i can remove this file ext and stay only with part-000XX convension

谢谢

推荐答案

您可以通过覆盖配置选项"spark.sql.sources.writeJobUUID"来删除UUID:

You can remove the UUID by overriding the configuration option "spark.sql.sources.writeJobUUID":

https://github.com/apache/c3f9f9a9f25a9a9f9a9f9a9f9a0f0f9d0f0f0f0e0f0f0f0f0f0f0b0f0b0e9f0f0d#diff-c69b9e667e93b7e4693812cc72abb65fR75

不幸的是，此解决方案不能完全反映旧的saveAsTextFile样式(即part-00000)，但可以使输出文件名更合理，例如part-00000-output.csv.gz，其中"output"是您传递的值到 spark.sql.sources.writeJobUUID .-"会自动添加

Unfortunately this solution will not fully mirror the old saveAsTextFile style (i.e. part-00000), but could make the output file name more sane such as part-00000-output.csv.gz where "output" is the value you pass to spark.sql.sources.writeJobUUID. The "-" is automatically appended

SPARK-8406 是与之相关的Spark问题，这是实际的拉请求: https://github.com/apache/spark/pull/6864

SPARK-8406 is the relevant Spark issue and here's the actual Pull Request: https://github.com/apache/spark/pull/6864

这篇关于Spark CSV 2.1文件名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark CSV 2.1文件名 [英] Spark CSV 2.1 File Names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark CSV 2.1文件名 [英] Spark CSV 2.1 File Names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭