哪一个效果更好，广播变量还是广播连接? [英] Which one will perform better, broadcast variable or broadcast join?

查看：94 发布时间：2020/9/4 1:35:23 dataframe apache-spark join apache-spark-sql broadcast

本文介绍了哪一个效果更好，广播变量还是广播连接?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我在我的项目中使用Spark 2.4.1和Java 8.

I am using Spark 2.4.1 with Java 8 in my project.

在一种情况下，我需要查找另一个具有两个字段(即国家/地区名称和国家/地区代码)的表/数据集.

I have a scenario where I need to look-up another table/dataset which has two fields i.e. country-name and country-code.

另一个流数据将在其中包含国家/地区代码列，我需要在目标/结果数据框中映射相应的国家/地区名称.

Another stream-data will have country-code column in it, I need to map respective country-name in the target/result dataframe.

据我所知，我们可以使用join来实现上述目的，可以使用广播变量和joining.

As far as I know, we can use join to achieve the above, using broadcast variable and joining.

那么从性能的角度来看，哪一个更好?什么是处理这类用例的火花标准?

So from performance point of view which one is better here? What is the spark standard to handle this kind of use-cases?