spark SQL 中的共同分区连接 [英] Co-partitioned joins in spark SQL

查看：27 发布时间：2021/11/14 21:41:21 apache-spark apache-spark-sql

本文介绍了spark SQL 中的共同分区连接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有任何提供联合分区连接的 Spark SQL 数据源的实现 - 最有可能是通过 CoGroupRDD?我在现有的 Spark 代码库中没有看到任何用途.

Are there any implementations of Spark SQL DataSources that offer Co-partition joins - most likely via the CoGroupRDD? I did not see any uses within the existing Spark codebase.

动机是在两个表具有相同数量和相同分区键范围的情况下大大减少混洗流量:在这种情况下，将使用 Mx1 而不是 MxN 随机扇出.

The motivation would be to greatly reduce the shuffle traffic in the case that two tables have the same number and same ranges of partitioning keys: in that case there would be a Mx1 instead of an MxN shuffle fanout.

目前 Spark SQL 中唯一大规模的连接实现似乎是 ShuffledHashJoin - 这确实需要 MxN shuffle fanout，因此很贵.

The only large-scale implementation of joins presently in Spark SQL seems to be ShuffledHashJoin - which does require the MxN shuffle fanout and thus is expensive.

spark SQL 中的共同分区连接 [英] Co-partitioned joins in spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark SQL 中的共同分区连接 [英] Co-partitioned joins in spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭