Spark dataFrame.colaesce(1) 或 dataFrame.reapartition(1) 似乎不起作用 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work

查看：38 发布时间：2021/11/14 22:00:26 apache-spark apache-spark-sql

本文介绍了Spark dataFrame.colaesce(1) 或 dataFrame.reapartition(1) 似乎不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将 Hive 插入到创建新 Hive 分区的查询中.我有两个名为 server 和 date 的 Hive 分区.现在我使用以下代码执行插入查询并尝试保存它

I have Hive insert into query which creates new Hive partitions. I have two Hive partitions named server and date. Now I execute insert into queries using the following code and try to save it

DataFrame dframe = hiveContext.sql("insert into summary1 partition(server='a1',date='2015-05-22') select from sourcetbl bla bla"); 
//above query creates orc file at /user/db/a1/20-05-22 
//I want only one part-00000 file at the end of above query so I tried the following and none worked 
drame.coalesce(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); OR

drame.repartition(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); OR

drame.coalesce(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); OR

drame.repartition(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); OR

无论我使用合并还是重新分区，查询都会在/user/db/a1/20-05-22 位置创建大约 200 个大约 20 MB 的小文件.使用 Hive 时，出于性能原因，我只需要一个 part0000 文件.我在想，如果我调用 coalesce(1) 那么它会创建最终的一个部分文件，但它似乎没有发生.我错了吗?

No matter I use coalesce or repartition above query creates around 200 small files around 20 MBs at the location /user/db/a1/20-05-22. I want only one part0000 file for performance reason when using Hive. I was thinking if I call coalesce(1) then it will create final one part file but it does not seem to happen. Am I wrong?

Spark dataFrame.colaesce(1) 或 dataFrame.reapartition(1) 似乎不起作用 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark dataFrame.colaesce(1) 或 dataFrame.reapartition(1) 似乎不起作用 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭