Spark保存(写入)镶木地板仅一个文件 [英] Spark save(write) parquet only one file
问题描述
如果我写
dataFrame.write.format("parquet").mode("append").save("temp.parquet")
在temp.parquet文件夹中 我得到与行号相同的文件号
我认为我对镶木地板不是很了解,但这很自然吗?
EDIT-1
仔细观察后,,最好使用 in temp.parquet folder
i got the same file numbers as the row numbers i think i'm not fully understand about parquet but is it natural? Use EDIT-1 Upon a closer look, the docs do warn about However, if you're doing a drastic coalesce, e.g. to numPartitions =
1, this may result in your computation taking place on fewer nodes
than you like (e.g. one node in the case of numPartitions = 1) Therefore as suggested by @Amar, it's better to use 这篇关于Spark保存(写入)镶木地板仅一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!coalesce
before write operationdataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet")
coalesce
repartition