如何在纱线群集模式确实为星火驱动程序故障转移过程的工作（及其YARN容器）资源/文档 [英] Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

查看：262 发布时间：2016/5/22 15:55:08 hadoop apache-spark yarn tachyon

本文介绍了如何在纱线群集模式确实为星火驱动程序故障转移过程的工作（及其YARN容器）资源/文档的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想了解是否星火驱动程序是一个单一故障点集群模式部署纱时。所以，我想获得关于这方面星火驱动的纱线集装箱故障转移过程的内部结构的一个更好的把握。

I'm trying to understand if the Spark Driver is a single point of failure when deploying in cluster mode for Yarn. So I'd like to get a better grasp of the innards of the failover process regarding the YARN Container of the Spark Driver in this context.

我知道星火驱动程序将在纱线Container中的星火应用程序主运行。如果需要，星火应用法师会请求资源纱线资源管理器。但我一直没能找到有足够的细节有关星火应用法师YARN容器的情况下故障转移过程的文件（和星火驱动程序）失败。

I know that the Spark Driver will run in the Spark Application Master inside a Yarn Container. The Spark Application Master will request resources to the YARN Resource Manager if required. But I haven't been able to find a document with enough detail about the failover process in the event of the YARN Container of the Spark Application Master (and Spark driver) failing.

我试图找出一些详细的资源，可以让我回答有关以下情形的一些问题：如纱Container的主机运行星火应用主/星火司机损失的网络连接1小时：

I'm trying to find out some detailed resources that can allow me to answer some questions related to the following scenario: If the host machine of the YARN Container that runs the Spark Application Master / Spark Driver losses network connectivity for 1 hour:

是否YARN资源管理器生成一个新的集装箱纱与另一星火应用主/星火驱动程序？

Does the YARN Resource Manager spawn a new YARN Container with another Spark Application Master/Spark Driver?

在这种情况下，（产卵新型纱线容器），它从头开始星火驱动程序，如果在执行者1至少1阶段已经结束，并通知这样原来的驱动程序之前失败了吗？在是否使用选项坚持（）在这里做一个区别？并且将在新的Spark司机知道遗嘱执行人已完成1阶段？会的Tachyon帮助解决这个情况？

In that case (spawning a new YARN Container), does it start the Spark Driver from scratch if at least 1 stage in 1 of the Executors had been completed and notified as such to the original Driver before it failed? Does the option used in persist() make a difference here? And will the new Spark Driver know that the executor had completed 1 stage? Would Tachyon help out in this scenario?

请问如果网络连接在原星火申请硕士的纱线Container的主机恢复故障恢复过程被触发？我想，这种行为可以从纱线被控制，但我不知道在部署群集模式SPARK什么时候是默认的。

Does a failback process get triggered if network connectivity is recovered in the YARN Container's host machine of the original Spark Application Master? I guess that this behaviour can be controlled from YARN, but I don't know what's the default when deploying SPARK in cluster mode.

我真的AP preciate它，如果你可以点我到一些文件/网页，其中星火纱线集群模式的架构和故障转移过程进行了详细的探讨。

I'd really appreciate it if you can point me out to some documents / web pages where the Architecture of Spark in yarn-cluster mode and the failover process are explored in detail.

如何在纱线群集模式确实为星火驱动程序故障转移过程的工作（及其YARN容器）资源/文档 [英] Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在纱线群集模式确实为星火驱动程序故障转移过程的工作（及其YARN容器）资源/文档 [英] Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭