如何提示排序合并联接或混搭哈希联接(并跳过广播哈希联接)? [英] How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

查看：80 发布时间：2020/9/4 20:17:11 scala apache-spark apache-spark-sql

本文介绍了如何提示排序合并联接或混搭哈希联接(并跳过广播哈希联接)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark 2.1中遇到join的问题.尽管表很大(1400万行)，但Spark(错误地?)选择了广播哈希join.然后，由于没有足够的内存，作业崩溃了，Spark以某种方式尝试将广播的片段保存到磁盘上，从而导致超时.

I have an issue with a join in Spark 2.1. Spark (wrongly?) chooses a broadcast-hash join although the table is very large (14 million rows). The job then crashes because there is not enough memory and Spark somehow tries to persist the broadcast pieces to disk, which then lead to a timeout.

因此，我知道有一个查询提示可以强制进行广播联接(org.apache.spark.sql.functions.broadcast)，但是还有一种方法可以强制执行另一种联接算法吗?

So, I know there is a query hint to force a broadcast-join (org.apache.spark.sql.functions.broadcast), but is there also a way to force another join algorithm?

我通过设置spark.sql.autoBroadcastJoinThreshold=0解决了我的问题，但是我希望使用另一个更精细的解决方案，即不全局禁用广播连接.

I solved my issue by setting spark.sql.autoBroadcastJoinThreshold=0, but I would prefer another solution which is more granular, i.e. not disable the broadcast join globally.

如何提示排序合并联接或混搭哈希联接(并跳过广播哈希联接)? [英] How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何提示排序合并联接或混搭哈希联接(并跳过广播哈希联接)? [英] How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭