如何提示排序合并连接或混洗哈希连接(并跳过广播哈希连接)? [英] How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

查看：15 发布时间：2021/11/14 22:56:46 scala apache-spark apache-spark-sql

本文介绍了如何提示排序合并连接或混洗哈希连接(并跳过广播哈希连接)?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 Spark 2.1 中遇到了 join 问题.尽管表非常大(1400 万行)，但 Spark(错误地?)选择了广播散列 join.然后作业崩溃，因为没有足够的内存，Spark 以某种方式尝试将广播片段持久化到磁盘，然后导致超时.

I have an issue with a join in Spark 2.1. Spark (wrongly?) chooses a broadcast-hash join although the table is very large (14 million rows). The job then crashes because there is not enough memory and Spark somehow tries to persist the broadcast pieces to disk, which then lead to a timeout.

所以，我知道有一个查询提示可以强制进行广播连接(org.apache.spark.sql.functions.broadcast)，但是还有一种方法可以强制使用另一种连接算法?

So, I know there is a query hint to force a broadcast-join (org.apache.spark.sql.functions.broadcast), but is there also a way to force another join algorithm?

我通过设置 spark.sql.autoBroadcastJoinThreshold=0 解决了我的问题，但我更喜欢另一种更精细的解决方案，即不全局禁用广播连接.

I solved my issue by setting spark.sql.autoBroadcastJoinThreshold=0, but I would prefer another solution which is more granular, i.e. not disable the broadcast join globally.

如何提示排序合并连接或混洗哈希连接(并跳过广播哈希连接)? [英] How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何提示排序合并连接或混洗哈希连接(并跳过广播哈希连接)? [英] How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭