火花累积器-在执行器中执行Vs在驱动器中执行的代码 [英] spark accumulators - which is executed in executor Vs the code executed in driver

查看:46
本文介绍了火花累积器-在执行器中执行Vs在驱动器中执行的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

就像《 Spark In Action》一书中提到的

It is mentioned like in " Spark In Action" Book ,

您只能从驱动程序内部访问累加器的值.如果尝试从执行程序访问它,则将引发异常.

You can access an accumulator’s value only from within the driver. If you try to access it from an executor, an exception will be thrown.

我正在学习火花,并且遇到了上述问题.如何区分或识别执行程序中执行的代码与驱动程序中执行的代码.

I am learning spark and come across the above. How it could be differentiated or recognize the code which is executed in executor Vs the code executed in driver.

此外,作者使用以下代码引用了以上内容

Further, the author referred the above with the following code

https://i.imgur.com/aWx1nAs.png

推荐答案

在执行程序上执行的转换&动作在驱动程序上运行,换句话说,任务(转换)在工作者(执行程序)上执行,当动作(执行/收集)被调用时,它将在驱动程序处取回数据.返回值.

Transformations run on executors & actions runs on driver other words tasks(transformation) executes on the Workers(Executors) and when action(take/collect) is called it brings back the data at the Driver. to return value.

在RDD上调用任何操作时,Spark创建DAG并将其提交给DAG调度程序,DAG调度程序将运算符划分为多个任务阶段.一个阶段由基于输入数据分区的任务组成.DAG调度程序将操作员流水线在一起.

When any action is called on the RDD, Spark creates the DAG and submits to the DAG scheduler,DAG scheduler divides operators into stages of tasks. A stage is comprised of tasks based on partitions of the input data. The DAG scheduler pipelines operators together.

阶段被传递到任务计划程序.任务计划程序通过集群管理器(独立/纱线/Mesos)启动任务.

The Stages are passed on to the Task Scheduler.The task scheduler launches tasks via cluster manager.(Standalone/Yarn/Mesos).

这篇关于火花累积器-在执行器中执行Vs在驱动器中执行的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆