星火dataFrame.colaesce（1）或dataFrame.reapartition（1）似乎并没有为我工作 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work for me

查看：578 发布时间：2016/5/22 16:05:16 apache-spark apache-spark-sql

本文介绍了星火dataFrame.colaesce（1）或dataFrame.reapartition（1）似乎并没有为我工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好我有蜂巢插入查询，它创造了新的蜂巢分区。我有一个名为服务器和日起二蜂巢分区。现在我执行INSERT INTO使用以下code查询，并尝试将其保存

Hi I have Hive insert into query which creates new Hive partitions. I have two Hive partitions named server and date. Now I execute insert into queries using the following code and try to save it

DataFrame dframe = hiveContext.sql("insert into summary1 partition(server='a1',date='2015-05-22') select from sourcetbl bla bla"); 
//above query creates orc file at /user/db/a1/20-05-22 
//I want only one part-00000 file at the end of above query so I tried the following and none worked 
drame.coalesce(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); OR

drame.repartition(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1"); OR

drame.coalesce(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); OR

drame.repartition(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite); OR

不管我用COALESCE或reparition上面的查询在该位置/用户/ DB / A1 / 20-05-22创造约20 MB的大约200名小文件。我使用蜂巢时，只需要一个性能原因part0000文件。我在想，如果我称之为 COALESCE（1）然后它会创建最后一个部分文件，但它似乎并没有发生。我错了吗？请指导。先谢谢了。

No matter I use coalesce or reparition above query creates around 200 small files around 20 MBs at the location /user/db/a1/20-05-22. I want only one part0000 file for performance reason when using Hive. I was thinking if I call coalesce(1) then it will create final one part file but it does not seem to happen. Am I wrong? Please guide. Thanks in advance.

星火dataFrame.colaesce（1）或dataFrame.reapartition（1）似乎并没有为我工作 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work for me

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火dataFrame.colaesce（1）或dataFrame.reapartition（1）似乎并没有为我工作 [英] Spark dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem to work for me

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭