启动执行程序时，Spark on yarn是否处理数据局部性 [英] Does Spark on yarn deal with Data locality while launching executors

查看：92 发布时间：2020/11/22 19:24:03 apache-spark hdfs yarn

本文介绍了启动执行程序时，Spark on yarn是否处理数据局部性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在考虑火花执行器的静态分配. 启动执行程序时，Spark on yarn是否考虑了在Spark应用程序中使用的原始输入数据集的数据局部性.

I am considering static allocation of spark executor. Does Spark on yarn consider Data locality of raw input dataset getting used in spark application while launching executors.

如果它确实做到了这一点，那么将在初始化Spark上下文时请求并分配Spark执行器. Spark应用程序可能会使用多个原始输入数据集，这些原始数据集实际上可能驻留在许多不同的数据节点上.我们不能在所有这些节点上运行执行程序.

If it does take care of this how it does so as spark executor are requested and allocated when spark context gets initialized. There could be a chance that multiple raw input data set getting used in the spark application which could physically reside on many different data node. we can't run executor on all those node.

我了解spark在计划执行程序上的任务时会注意数据的局部性(如所述

I understand spark takes care of data locality while scheduling task on executor(as mentioned https://spark.apache.org/docs/latest/tuning.html#data-locality).

启动执行程序时，Spark on yarn是否处理数据局部性 [英] Does Spark on yarn deal with Data locality while launching executors

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

启动执行程序时，Spark on yarn是否处理数据局部性 [英] Does Spark on yarn deal with Data locality while launching executors

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭