Spark dataFrame.colaesce(1)或dataFrame.reapartition(1)似乎不起作用 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work

查看：84 发布时间：2021/4/8 19:25:33 apache-spark apache-spark-sql

本文介绍了Spark dataFrame.colaesce(1)或dataFrame.reapartition(1)似乎不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在查询中插入了Hive，这会创建新的Hive分区.我有两个Hive分区，分别名为server和date.现在，我使用以下代码执行插入查询的操作，并尝试将其保存

I have Hive insert into query which creates new Hive partitions. I have two Hive partitions named server and date. Now I execute insert into queries using the following code and try to save it

DataFrame dframe = hiveContext.sql("insert into summary1 partition(server='a1',date='2015-05-22') select from sourcetbl bla bla"); 
//above query creates orc file at /user/db/a1/20-05-22 
//I want only one part-00000 file at the end of above query so I tried the following and none worked 
drame.coalesce(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); OR

drame.repartition(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); OR

drame.coalesce(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); OR

drame.repartition(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); OR

无论我使用合并还是重新分区，上面的查询都会在/user/db/a1/20-05-22位置创建约200个大小约为20 MB的小文件.使用Hive时，出于性能原因，我只需要一个part0000文件.我在想如果我调用 coalesce(1)，那么它将创建最终的一个零件文件，但似乎没有发生.我错了吗?

No matter I use coalesce or repartition above query creates around 200 small files around 20 MBs at the location /user/db/a1/20-05-22. I want only one part0000 file for performance reason when using Hive. I was thinking if I call coalesce(1) then it will create final one part file but it does not seem to happen. Am I wrong?

Spark dataFrame.colaesce(1)或dataFrame.reapartition(1)似乎不起作用 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark dataFrame.colaesce(1)或dataFrame.reapartition(1)似乎不起作用 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭