应该是什么spark.sql.shuffle.partitions的最佳值或使用SQL星火的时候，我们如何增加分区？ [英] What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL?

查看：2112 发布时间：2016/5/22 15:52:25 apache-spark apache-spark-sql

本文介绍了应该是什么spark.sql.shuffle.partitions的最佳值或使用SQL星火的时候，我们如何增加分区？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用SQL星火实际上 hiveContext.sql（）它通过查询使用组和我遇到 OOM 的问题。所以，想从200默认增加 spark.sql.shuffle.partitions 的价值为1000，但它并没有帮助。如果我错了这个分区将共享数据洗牌负荷，从而更分区较少的数据保存请指正。请指导我是新来的火花。我使用的Spark 1.4.0和我身边有uncom $ P $的1TB pssed数据使用 hiveContext.sql（） GROUP BY的查询处理。

Hi I am using Spark SQL actually hiveContext.sql() which uses group by queries and I am running into OOM issues. So thinking of increasing value of spark.sql.shuffle.partitions from 200 default to 1000 but it is not helping. Please correct me if I am wrong this partitions will share data shuffle load so more the partitions less data to hold. Please guide I am new to Spark. I am using Spark 1.4.0 and I have around 1TB of uncompressed data to process using hiveContext.sql() group by queries.

应该是什么spark.sql.shuffle.partitions的最佳值或使用SQL星火的时候，我们如何增加分区？ [英] What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

应该是什么spark.sql.shuffle.partitions的最佳值或使用SQL星火的时候，我们如何增加分区？ [英] What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭