spark sql中sc.broadcast和broadcast函数的区别 [英] Difference between sc.broadcast and broadcast function in spark sql
问题描述
我使用 sc.broadcast
查找文件以提高性能.
I have used sc.broadcast
for lookup files to improve the performance.
我也开始知道 Spark SQL 函数中有一个叫做 broadcast
的函数.
I also came to know there is a function called broadcast
in Spark SQL Functions.
两者有什么区别?
我应该用哪个来广播参考/查找表?
Which one i should use it for broadcasting the reference/look up tables?
推荐答案
如果你想在 Spark SQL 中实现广播连接,你应该使用 broadcast
函数(结合所需的 spark.sql.autoBroadcastJoinThreshold
配置).它会:
If you want to achieve broadcast join in Spark SQL you should use broadcast
function (combined with desired spark.sql.autoBroadcastJoinThreshold
configuration). It will:
- 标记给定的广播关系.
- 调整 SQL 执行计划.
- 在评估输出关系时,它将负责收集数据、广播和应用正确的连接机制.
SparkContext.broadcast
用于处理本地对象,适用于 Spark DataFrames
.
SparkContext.broadcast
is used to handle local objects and is applicable for use with Spark DataFrames
.
这篇关于spark sql中sc.broadcast和broadcast函数的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!