如何配置spark以使其创建"_ $ folder $"S3中的条目? [英] How can I configure spark so that it creates "_$folder$" entries in S3?
问题描述
当我使用
df.write
.format("parquet")
.mode("overwrite")
.partitionBy("year", "month", "day", "hour", "gen", "client")
.option("compression", "gzip")
.save("s3://xxxx/yyyy")
我在S3中得到以下内容
I get the following in S3
year=2018
year=2019
但我想改成这个:
year=2018
year=2018_$folder$
year=2019
year=2019_$folder$
从该S3位置读取的脚本取决于 * _ $ folder $
条目,但是我还没有找到配置spark/hadoop生成它们的方法.
The scripts that are reading from that S3 location depend on the *_$folder$
entries, but I haven't found a way to configure spark/hadoop to generate them.
关于哪种hadoop或spark配置设置可以控制 * _ $ folder $
文件的生成的任何想法?
Any idea on what hadoop or spark configuration setting control the generation of *_$folder$
files?
推荐答案
这些标记了旧功能;我认为没有什么可以再创建它们了……尽管在实际列出目录时它们通常会被忽略.(也就是说,即使在那里,它们也会从清单中剥离并替换为目录条目).
those markers a legacy feature; I don't think anything creates them any more...though they are often ignored when actually listing directories. (that is, even if there, they get stripped from listings and replaced with directory entries).
这篇关于如何配置spark以使其创建"_ $ folder $"S3中的条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!