共划分火花SQL连接 [英] Co-partitioned joins in spark SQL

查看：185 发布时间：2016/5/22 15:29:06 apache-spark apache-spark-sql

本文介绍了共划分火花SQL连接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有提供联合分区连接星火SQL数据源中的任何实现的 - 最有可能通过CoGroupRDD？我没看到现有的星火codeBase类中的任何用途。

Are there any implementations of Spark SQL DataSources that offer Co-partition joins - most likely via the CoGroupRDD? I did not see any uses within the existing Spark codebase.

的动机将大大减少在这两个表中有相同数量和分区键的范围相同的情况下，洗牌交通：在这种情况下会有一个 MX1 ，而不是<强> M×N个洗牌扇出。

The motivation would be to greatly reduce the shuffle traffic in the case that two tables have the same number and same ranges of partitioning keys: in that case there would be a Mx1 instead of an MxN shuffle fanout.

唯一的大规模实施连接presently在星火SQL似乎 ShuffledHashJoin 的 - 其中确实的需要的M×N个洗牌扇出并且因此是昂贵的。

The only large-scale implementation of joins presently in Spark SQL seems to be ShuffledHashJoin - which does require the MxN shuffle fanout and thus is expensive.

共划分火花SQL连接 [英] Co-partitioned joins in spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

共划分火花SQL连接 [英] Co-partitioned joins in spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭